Aldehyde Tags, Uses Thereof in Site-Specific Protein Modification

ABSTRACT

The invention features compositions and methods for site-specific modification of proteins by incorporation of an aldehyde tag. Enzymatic modification at a sulfatase motif of the aldehyde tag through action of a formylglycine generating enzyme (FGE) generates a formylglycine (FGly) residue. The aldehyde moiety of FGly residue can be exploited as a chemical handle for site-specific attachment of a moiety of interest to a polypeptide.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority benefit of U.S. provisional applicationSer. No. 60/846,644, filed Sep. 21, 2006, which application isincorporated herein by reference in its entirety.

GOVERNMENT RIGHTS

This invention was made with government support under federal grant no.R01-AI051622 awarded by National Institute of Health. The United StatesGovernment has certain rights in this invention.

BACKGROUND

Site-specific labeling of proteins is an important as a tool for thedissection of biochemical and cellular networks. A variety oftechnologies have been developed to address this need. One suchtechnology used for protein localization and tracking is labeling withfluorescent proteins, such as green fluorescent protein (GFP). However,the size of these fluorescent proteins can interfere with thetrafficking, localization and protein-protein interactions of the target(Lisenbee et al. Traffic 2003, 4, (7), 491-501).

As a result, many groups have focused their attention on using smallerfusions to direct specific secondary labeling reagents. FlAsH, developedby Roger Tsien and colleagues, utilizes the interaction betweenspecifically arranged tetracysteine motifs and biarsenyl-fluorophores(Chen et al. Science 1998, 281, (5374), 269-272). Despite picomolaraffinity between the minimal 8 amino acid sequence and bi-arsenicalprobes (Adams et al. J. Amer. Chem. Soc. 2002, 124, (21), 6063-6076),background due to native cysteine motifs remains a problem (Stroffekovaet al. Pflugers Archiv-Eur. J. Physiol. 2001, 442, (6), 859-866).

To increase specificity, peptide targeting motifs that depend uponsecondary labeling by enzymes have been explored. One such strategydepends upon the fusion with O⁶-alkylguanine-DNA transferase (hAGT),which can ligate a wide variety of small molecules to an internalcysteine. While hAGT fusions allow very specific, covalent attachment ofa wide variety of small molecule probes it relies upon a 207 amino acidfusion (George et al. J. Amer. Chem. Soc. 2004, 126, (29), 8896-8897;Guignet et al. Nature Biotechnol 2004, 22, (4), 440-444). In a separateapproach, protein fusions with the approximately 80 amino acid acylcarrier protein can be specifically labeled with CoA-derived probesusing the enzyme phosphopantetheine transferase. Alternatively, biotinligase has been used to transfer biotin or a ketone-containing biotinisostere to a 15 amino acid acceptor peptide. Appendage of the ketoneisostere allows the formation of hydrazones and oxime conjugates.

There is a need for new approaches for site-specific modification ofproteins.

Literature

Adams et al. 2002 J. Amer. Chem. Soc. 2002, 124, (21), 6063-6076;Banghart et al. 2004 Nat. Neurosci. 7(12):1381-6. Epub 2004 Nov 21;Berteau et al. 2006 J Biol Chem. 281(32):22464-70 (Epub 2006 Jun. 9);Chen et al. 2005 Nature Methods 2005, 2, (2), 99-104; Cosma et al. 2003Cell 113, (4), 445-56; Dierks et al. 1997 Proc Natl Acad Sci USA 94,(22), 11963-8; Dierks et al. 2003 Cell 113, (4), 435-44; Dierks et al.2005 Cell 121, (4), 541-52; George et al. 2004 J. Amer. Chem. Soc. 126,(29), 8896-8897; Griffin et al. 1998 Science 281, (5374), 269-272;Guignet et al. 2004 Nature Biotechnol. 22, (4), 440-444; Landgrebe etal. 2003 Gene 316:47-56; Lemieux (1998) Trends Biotechnol 16, 506-13;Lisenbee et al. 2003 Traffic 4, (7), 491-501; Mariappan et al. 2005 J.Biol. Chem. 280(15):15173-9 (Epub 2005 Feb. 11); Mougous et al. 2004Nat. Struc. Mol. Biol. 11, 721-729; Preusser et al. 2005 J. Biol. Chem.280(15):14900-10 (Epub 2005 Jan. 18); Roeser et al. 2006 Proc Natl AcadSci USA 103(1):81-6 (Epub 2005 Dec. 20); Rush et al. (Jan. 5 2006) OrgLett. 8(1):131-4; Sardiello et al. 2005 Human Mol. Genet. 14, 3203-3217;Schirmer et al. 1998 Chemistry & Biology 5, R181-R186; Schmidt et al.1995 Cell 82, (2), 271-8; Stroffekova et al. 2001 Archiv-Europ. J.Physiol. 442, (6), 859-866; Szameit et al. 1999 J Biol Chem 274, (22),15375-81; Yin, J. et al. 2005 Proc. Natl. Acad. Sci. USA 102,15815-15820 (2005); US20050026234; US20030186229; and U.S. Pat. No.6,900,304.

SUMMARY

The invention features compositions and methods for site-specificmodification of proteins by incorporation of an aldehyde tag. Enzymaticmodification at a sulfatase motif of the aldehyde tag through action ofa formylglycine generating enzyme (FGE) generates a formylglycine (FGly)residue. The aldehyde moiety of FGly residue can be exploited as achemical handle for site-specific attachment of a moiety of interest toa polypeptide.

Accordingly, the present disclosure provides methods for modifying apolypeptide, the method comprising contacting a polypeptide comprising aconverted sulfatase motif with a reactive partner comprising a moiety ofinterest, wherein the converted sulfatase motif comprises:

X₁(FGly)X₂Z₂X₃R   (I)

where

FGly is a formylglycine residue;

Z₂ is a proline or alanine residue;

X₁ is present or absent and, when present, is any amino acid, with theproviso that when the heterologous sulfatase motif is at an N-terminusof the polypeptide, X₁ is present; and

X₂ and X₃ are each independently any amino acid;

wherein said contacting is under conditions sufficient for conjugationof the moiety of interest of the reactive partner to FGly of thepolypeptide, thereby producing a modified polypeptide.

The sulfatase motif can be a heterologous sulfatase motif. Furthermore,the FGly residue can be positioned at an internal sequence of thepolypeptide, and/or positioned at a terminal loop, a C-terminus, or anN-terminus of the polypeptide. Of particular interest are situations inwhich the FGly residue is present on a solvent-accessible region of thepolypeptide when folded. Further of interest are situations in which theFGly residue is present at a site of post-translational modification ofthe polypeptide, such as a glycosylation site. These sties ofpost-translation modification can be native to the parent polypeptide,or the polypeptide can be engineered to include one or more non-nativesites of post-translational modification, and the heterologous sulfatasemotif is positioned at said one or more non-native sites ofpost-translational modification.

Of particular interest are sulfatase motifs where X₁, when present, isL, M, V, S or T. Further sulfatase motifs of particular interest arethose where X₂ and X₃ are each independently an aliphatic amino acid, apolar, uncharged amino acid, or a sulfur containing amino acid (i.e.,other than a aromatic amino acid or a charged amino acid), and incertain embodiments are each independently S, T, A, V, G or C.

The present disclosure also provides methods for producing aformylglycine in a polypeptide, the method comprising contacting apolypeptide comprising a heterologous sulfatase motif with aformylglycine generating enzyme (FGE), wherein the heterologoussulfatase motif of the formula

X₁Z₁X₂Z₂X₃R   (I)

where

Z₁ is cysteine or serine;

Z₂ is a proline or alanine residue;

X₁ may be present or absent and, when present, is any amino acid, withthe proviso that when the heterologous sulfatase motif is at anN-terminus of the polypeptide, X₁ is present;

X₂ and X₃ are independently any amino acid, wherein said contacting isunder conditions sufficient for conversion of Z₁ to a formylglycine(FGly) residue in the polypeptide and produces a converted aldehydetagged polypeptide.

The polypeptide used in this method can have at least one of thefollowing properties: the heterologous sulfatase motif is less than 16amino acid residues in length, the heterologous sulfatase motif ispositioned at an N-terminus of the polypeptide, the heterologoussulfatase motif is positioned at an internal site of an amino acidsequence native to the polypeptide, the heterologous sulfatase motif ispositioned in a terminal loop of the polypeptide, the heterologoussulfatase motif positioned at a site of post-translational modificationof the polypeptide; the polypeptide is a full-length polypeptide, thepolypeptide is other than a preprolactin polypeptide, a prolactinpolypeptide, or a glutathione-S-transferase polypeptide.

The heterologous sulfatase motif can be less than 16 amino acid residuesin length and can be positioned at a C-terminus of the polypeptide. Theheterologous sulfatase motif can be present at an internal site in aterminal loop of the polypeptide and/or is present at an internal sitewithin an extracellular loop or an intracellular loop. The heterologoussulfatase motif can be present at an internal site or at the N-terminus,and/or can be solvent-accessible when the polypeptide is folded.Theheterologous sulfatase motif can be present at a site ofpost-translational modification, such as a glycosylation site. The siteof post-translational modification can be native to the parent targetpolypeptide or the target polypeptide can be engineered to include oneor more non-native sites of post-translational modification, and whereinthe heterologous sulfatase motif is positioned at said one or morenon-native sites of post-translational modification.

Of particular interest are sulfatase motifs where X₁, when present, isan aliphatic amino acid, a sulfur-containing amino acid, or a polar,uncharged amino acid, (i.e., other than a aromatic amino acid or acharged amino acid), and may in certain embodiments be L, M, V, S or T.Further sulfatase motifs of particular interest are those where X₂ andX₃ are each independently an aliphatic amino acid, a polar, unchargedamino acid, or a sulfur containing amino acid (i.e., other than aaromatic amino acid or a charged amino acid), and in certain embodimentsare each independently S, T, A, V G or C. In one embodiments ofinterest, the polypeptide is expressed in a cell containing the FGE.

In further embodiments, the method further comprises contacting theconverted aldehyde tagged polypeptide with a reactive partner comprisinga moiety of interest; wherein said contacting is under conditions toprovide for production of a reaction product of a modified aldehydetagged polypeptide having the moiety of interest covalently bound to theFGly residue of the heterologous sulfatase motif. The moiety of interestcan be, e.g., a water-soluble polymer, a detectable label, a drug, or amoiety for immobilization of the polypeptide in a membrane or on asurface.

The disclosure also provides a converted aldehyde tagged polypeptideproduced by the methods described herein, as well as a modified aldehydetagged polypeptide produced by the methods described herein.

The disclosure further provides polypeptides comprising a heterologoussulfatase motif having a formylglycine generating enzyme (FGE), whereinthe heterologous sulfatase motif comprises

X₁(FGly)X₂Z₂X₃R   (I)

where

FGly is a formylglycine residue;

Z₂ is a proline or alanine residue;

X₁ may be present or absent and, when present, is any amino acid, withthe proviso that when the heterologous sulfatase motif is at anN-terminus of the aldehyde tagged polypeptide, X₁ is present; and

X₂ is any amino acid;

The polypeptide used in this method can have at least one of thefollowing properties: the heterologous sulfatase motif is less than 16amino acid residues in length, the heterologous sulfatase motif ispositioned at an N-terminus of the polypeptide, the heterologoussulfatase motif is positioned at an internal site of an amino acidsequence native to the polypeptide, the heterologous sulfatase motif ispositioned in a terminal loop of the polypeptide, the heterologoussulfatase motif is position at a site of post-translational modificationof the polypeptide; the polypeptide is a full-length polypeptide, or thepolypeptide is other than a preprolactin polypeptide, a prolactinpolypeptide, or a glutathione-S-transferase polypeptide.

The heterologous sulfatase motif of such polypeptides can be less than16 amino acid residues in length and can be positioned at a C-terminusof the polypeptide. The heterologous sulfatase motif can be present in aterminal loop of the polypeptide. The polypeptide can be a transmembraneprotein with the heterologous sulfatase motif present at an internalsite within an extracellular loop or an intracellular loop. Theheterologous sulfatase motif of the polypeptide can be present at aninternal site or at the N-terminus of the polypeptide, and issolvent-accessible when the polypeptide is folded. Further, theheterologous sulfatase motif can be present at a site ofpost-translational modification, such as a glycosylation site. The siteof post-translational modification can be native to the parent targetpolypeptide or the target polypeptide can be engineered to include oneor more non-native sites of post-translational modification, and whereinthe heterologous sulfatase motif is positioned at said one or morenon-native sites of post-translational modification. Of particularinterest are sulfatase motifs where X₁, when present, is though usuallyan aliphatic amino acid, a sulfur-containing amino acid, or a polar,uncharged amino acid, (i.e., other than a aromatic amino acid or acharged amino acid), and in certain embodiments is L, M, V, S or T.Further sulfatase motifs of particular interest are those where X₂ andX₃ are each independently an aliphatic amino acid, a polar, unchargedamino acid, or a sulfur containing amino acid (i.e., other than aaromatic amino acid or a charged amino acid), and in certain embodimentsare each independently S, T, A, V, G or C.

The disclosure also contemplates nucleic acid molecules comprising anucleotide sequence encoding such polypeptides, as well as vectors andrecombinant host cells containing such nucleic acid molecules.

The disclosure also provides modified polypeptides comprising aformylglycine residue covalently attached to a moiety of interest,wherein the polypeptide comprises a modified sulfatase motif of theformula:

X₁(FGly′)X₂Z₂X₃R   (I)

where

FGly′ is the formylglycine residue having a heterologous, covalentlyattached moiety;

Z₂ is a proline or alanine residue;

X₁ may be present or absent and, when present, is any amino acid,usually an aliphatic amino acid, a sulfur-containing amino acid, or apolar, uncharged amino acid, (i.e., other than a aromatic amino acid ora charged amino acid), with the proviso that when the heterologoussulfatase motif is at an N-terminus of the polypeptide, X₁ is present;and

X₂ is any amino acid.

The moiety of such modified polypeptides can be a water-soluble polymer,a detectable label, a drug, or a moiety for immobilization of thepolypeptide in a membrane or on a surface. The modified sulfatase motifof such modified polypeptides can be positioned in the modifiedpolypeptide at a site of post-translational modification of a parent ofthe modified polypeptide. The site of post-translation modification canbe, e.g., a glycosylation site. The site of post-translationalmodification can be native to the parent target polypeptide or thetarget polypeptide can be engineered to include one or more non-nativesites of post-translational modification, and wherein the heterologoussulfatase motif is positioned at said one or more non-native sites ofpost-translational modification.

The disclosure also provides recombinant nucleic acids comprising anexpression cassette comprising a first nucleic acid comprising analdehyde tag-encoding sequence; and a restriction site positioned 5′ or3′ of the aldehyde tag-encoding sequence, which restriction siteprovides for insertion of a second nucleic acid encoding a polypeptideof interest; and a promoter operably linked to the expression cassetteto provide for expression of an aldehyde tagged-polypeptide produced byinsertion of the second nucleic acid encoding the polypeptide ofinterest into the restriction site.

The present disclosure also identifies the formylglycine generatingenzyme (FGE) of Mycobacterium tuberculosis (Mtb FGE), and thusencompasses methods of its use and reaction mixtures comprising anisolated Mycobacterium tuberculosis formylglycine generating enzyme(FGE); and a polypeptide comprising a heterologous sulfatase motif ofthe formula:

X₁Z₁X₂Z₂X₃R   (I)

where

Z₁ is cysteine or serine;

Z₂ is a proline or alanine residue;

X₁ is present or absent and, when present, is any amino acid, with theproviso that when the heterologous sulfatase motif is at an N-terminusof the polypeptide, X₁ is present;

X₂ and X₃ are each independently any amino acid;, wherein theheterologous sulfatase motif of the formula.

The present disclosure also provides reaction mixtures of a polypeptidehaving a heterologous sulfatase motif as described herein and an FGE,which may further include a converted aldehyde tagged polypeptide inwhich a heterologous sulfatase motif of the polypeptide contains an FGlyresidue. The disclosure also provides compositions comprising an FGE anda converted aldehyde tagged polypeptide in which a heterologoussulfatase motif of the polypeptide contains an FGly residue. In relatedembodiments, such reaction mixtures may further include a reagent tofacilitate attachment of a moiety of interest to a FGly residue of apolypeptide.

Other features of the invention and its related disclosure are providedbelow, and will be readily apparent to the ordinarily skilled artisanupon reading the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is best understood from the following detailed descriptionwhen read in conjunction with the accompanying drawings. It isemphasized that, according to common practice, the various features ofthe drawings are not to-scale. On the contrary, the dimensions of thevarious features are arbitrarily expanded or reduced for clarity.Included in the drawings are the following figures:

FIG. 1A is a schematic showing an exemplary outline of the methods andcompositions of the invention. In this example, an exemplary sulfatasemotif (“LCTPSR”) is positioned in a construct containing a nucleic acidencoding a protein of interest.

FIG. 1B is a schematic showing a sequence alignment of the sulfatasemotif from a variety of sulfatases found in diverse organisms. Theconsensus sequence contains the sequence of the two aldehyde tags usedin this study. Conserved residues are highlighted.

FIG. 2 is a set of graphs showing mass spectrum analysis confirmingpresence of FGly in a tryptic peptide from mycobacterialsulfotransferase with a 13 residue sulfatase consensus motif(ald13-Stf0). Mass spectra confirming the presence of FGly in a trypticpeptide from ald₁₃-Stf0. (Panel a) Mass spectrum of tryptic fragmentsincorporating FGly (M+2/2). Theoretical: 378.2066 m/z Observed:378.2065. (Panel b) M+2/2 FT-ICR spectrum of tryptic fragmentincorporating FGly after treatment with methoxylamine Theoretical:392.7198 m/z Observed: 392.7201. (Panel c) MALDI-TOF/TOF sequencing ofthe tryptic fragment incorporating FGly.

FIG. 3 illustrates results of quantitation of FGly production andselective fluorescent labeling of aldehyde tagged constructs. (Panel a)Standard addition of synthetic PL(FGly)TPSR to ald₆-Stf0 tryptic digest.(Panel b) Standard addition of synthetic PLCTPSR to ald₆-Stf0 trypticdigest. (Panel c) Selective fluorescent labeling of ald₁₃-Stf0,ald₆-Stf0 and ald₆-MBP with aminooxy-AlexaFluor 647 imaged directly on afluorescent gel scanner. (Panel d) Protein loading was assed by SyproRuby staining.

FIG. 4 is a set of images illustrating selective modification ofaldehyde-tagged proteins. (Panel a) Switch assay of ald6-MBP. Lane 1:protein incubated with biotin hydrazide. Lane 2: protein incubated withbiotin hydrazide and subsequently modified with methoxylamine Lane 3:protein incubated with biotin hydrazide and subsequently modified withaminooxyFLAG. Protein loading (bottom box) was assessed by ponceaustaining. (Panel b) PEGylation of ald6-Stf0 with 5,000 Da aminooxyPEG(lane 1), 10,000 Da aminooxyPEG (lane 2) and 20,000 Da aminooxyPEG (lane3). Due to the PEG chains' lack of charge, the PEGylated proteinsmigrate slower than non-PEGylated proteins of similar molecular weight.

FIG. 5 is a set of gels images showing production of an ald-taggedsynthetic photoisomerizable azobenzene-regulted K+ (SPARK) channelprotein in CHO (Chinese hamster ovary) and HEK (human embryonic kidney)cells. (Panel a) SDS-PAGE gel stained for protein using Ponceau S.(Panel b) Detection of binding of anti-myc antibody. (Panel c) Detectionof binding of anti-FLAG antibody. V indicates the sample is a vectoronly negative control; P, C, and I represent three strategies forinserting an exemplary 6 residue ald-tag, namely adding the ald-tagwithin one of the extracellular loops (I), deletion of 6 residues fromthe loop and replacement with the ald-tag (C), or deleting 3 residuesfrom the loop and then adding the 6 residue tag (P). (+) refers to apositive control sample, which is a CHO cell lysate containing a 17 kDamyc-tagged protein.

FIG. 6 illustrates generation of hydrazone, oxime and semicarbazonelinkages. R and R′ refer to suitable substituents which appear in thealdehyde tagged polypeptide as disclosed herein. R″ refers to asubstituent of the reagent which is transferred to the aldehyde taggedpolypeptide in the reaction product.

FIG. 7 illustrates activation of sulfatases by formylglycine generatingenzyme (FGE) and proposed sulfatase mechanism. (Panel a) FGE activatessulfatases by oxidizing an active site cysteine to a 2-formylglycylresidue (FGly). Previously determined sulfatase crystal structuresindicate that the active site FGly is hydrated, suggesting that sulfateester cleavage is mediated by a transesterification-eliminationmechanism6. (Panel b) The sulfatase motif is located towards theN-terminus of sulfatases and targets the appropriate cysteine (*) formodification by FGE. Boxed residues indicate an exact residue match;underlined residues indicates conserved residues; residues with a dot(•) indicates similar residues.

FIG. 8 provides results showing function of Mtb FGE (Rv0712) in vitroand in vivo. (Panel a) A synthetic peptide resembling a sulfatase motifwas treated with recombinant Mtb FGE and the resulting oxidation ofcysteine to FGly was monitored by mass spectrometry. The Cys263Ser FGEmutant was inactive on the peptide substrate. The ions at m/z 1427 and1445 are sodium adducts of the modified and unmodified peptide,respectively. (Panel b) Upon treatment with biotin hydrazide, theFGly-containing peptide forms a hydrazone adduct with biotin, resultingin a mass shift of +240 Da. (Panel c) Lysates from wild-type (WT), Δfge,and complemented (Δfge+fge) strains of Mtb H37Rv were tested forsulfatase activity using the fluorogenic substrate 4-methylumbelliferylsulfate (4MUS) with and without sulfatase/phosphatase inhibitors. Limpetsulfatase was used as a positive control. (Panel d) Lysates from WT,Δfge, and Δfge+fge strains of Mtb H37Rv were tested for phosphataseactivity using the fluorogenic substrate 4-methylumbelliferyl phosphatewith and without sulfatase/phosphatase inhibitors. The recombinant Mtbphosphatase PtpA was used as a positive control.

FIG. 9 provides results of Southern blot analysis of Mtb Δfge mutant.Genomic DNA was digested with FspI or NcoI, separated by agarose gelelectrophoresis, and transferred to a nylon blot. The blot was probedwith a 474 bp digoxigenin-labeled DNA fragment, identifying a 4.8 kbFspI fragment and 5.7 kb NcoI fragment for wild-type and 5.5 kb FspIfragment and 5.1 kb NcoI fragment for the mutant.

FIG. 10 shows data illustrating that recombinant Rv2407, Rv3406 andRv3762c do not exhibit activity in the 4MUS assay. Limpet sulfatase wasused as a positive control.

FIGS. 11A-E are schematics showing the structure of Strep FGE. (FIG.11A) Stereo superposition of Strep FGE and human FGE. Strep FGEsecondary structure elements indicated. Ca2+ions are rendered asspheres. Overall root mean square deviation is 0.65 A. (FIG. 11B)Comparison of the residues surrounding Strep FGE's potential Ca2+binding site and human FGE's second Ca2+binding site. Propercoordination geometry is lost with a Glu66Ala mutation in Strep FGE.(FIG. 11C) Surface representation of Strep FGE's putative exosite. Thesurface, represented in grey scale, was is colored according to residueconservation between all known and putative FGE's (based on amino acidsequence alignment); white represents non-conserved residues, light bluerepresents weakly conserved residues, medium blue represents conservedresidues, and dark blue represents identical residues. The 6-residuepeptide substrate is modeled from the human FGE-peptide complexstructure24 (PDB entry 2AIK). A hypothetical extended peptide substrateis represented as a ribbon. (FIGS. 11D and 11E) Active site cysteines272 and 277 appear to exist in a partial disulfide bond. Cys272 is shownin two alternate conformations. Electron density between Cys272 andSer269 can be modeled as a water molecule (FIG. 11D) or a hydroperoxidewith partial occupancy (FIG. 11E). Monomer D is shown. Omit electrondensity is contoured at 1 6.

FIG. 12 is a table showing ICP-AES analysis of Strep FGE and Mtb FGE.

FIG. 13 provides data illustrating that Mtb and Strep FGE activity isdependent upon molecular oxygen but independent of metal cofactors. (a)A synthetic peptide resembling the sulfatase motif (mass=1421.7 Da) wasused as a substrate for FGE. Conversion of cysteine to FGly resulted ina loss of 18 Da, which was detected by mass spectrometry. (b,g) Metalchelator EDTA had no effect on activity. (c,h,i) Loss of active sitecysteines in Mtb and Strep FGE abolished activity. (d) Loss of activesite Ser260 in Mtb FGE significantly reduced activity. (e) Mtb FGE wasinactive in the absence of molecular oxygen. (f) WT Strep FGE was ableto oxidize the synthetic peptide. (j) Active site Trp234 in Strep FGE isnot essential for catalytic activity. The ion at m/z 1427 is the sodiumadduct of the FGly-containing product peptide.

FIG. 14 provides data illustrating that active site cysteines 272 and277 in Strep FGE are engaged in a partial disulfide bond. (a) Strep FGEwas labeled with NBD and adducts were detected by mass spectrometry. Twopopulations of NBD-modified Strep FGE were observed. One population hada single NBD adduct, which corresponds to disulfide connected Cys272 andCys277 (see b). The other population had three NDB adducts correspondingto Cys 272 and Cys277 thiols (see c). Multiple charge states are shownfor both populations. Numbers above each ion peak indicate the number ofNBD adducts attributed to each population. (b-d) NBD adducts were mappedto each surface-exposed thiol using mass spectrometry aftertrypsinolysis. Shown are the +2 charge state ions corresponding todisulfide connected Cys272 and Cys277 (b, calculated mass=1,471.58;observed mass=1,471.58), NBD adducts on both Cys272 and Cys277 (c,calculated mass=1,799.58; observed mass=1,799.61), and an NBD adduct onCys301 (d, calculated mass=1486.52; observed mass=1,486. 54). (e) Cys301was not observed as a free thiol (expected position of +2 charge stateion shown). Cys301 was an additional surface exposed thiol that servedas an internal control to assess labeling efficiency.

FIG. 15 provides data illustrating that a sub-population of Strep FGECys277Ser does not contain a reduced Cys272 residue. (a) Strep FGECys277Ser was labeled with NBD and adducts were detected by massspectrometry. Two populations of NBD-modified Strep FGE Cys277Ser wereobserved. One population corresponds to Cys272 as a free thiol with asingle NBD adduct on Cys301 (see b and d). The other population had NDBadducts at both Cys272 and Cys301 (see c and d). Multiple charge statesare shown for both populations. Numbers above each ion peak indicate thenumber of NBD adducts attributed to each population. (b-d) NBD adductswere mapped to each surface-exposed thiol using mass spectrometry aftertrypsinolysis. Shown are the +2 charge state ions corresponding toCys272 as a free thiol (b, calculated mass=1,457.61; observedmass=1,457.60), Cys272 with an NBD adduct (c, calculated mass=1,620.62;observed mass=1,620.60), and Cys301 with an NBD adduct (d, calculatedmass=1,486.54; observed mass=1486.54). (e) Cys301 was not observed as afree thiol (expected position of +2 charge state ion shown). Cys301 wasan additional surface exposed thiol that served as an internal controlto assess labeling efficiency.

FIG. 16 provides data illustrating that Strep FGE does not exhibit amass anomaly indicative of a stable, covalent modification of Cys272.The His6-tag was cleaved from Strep FGE and the mass of the enzyme wasdetermined by mass spectrometry. Numbers above each ion peak representcharge states.

FIG. 17 provides a schematic of site-specific labeling of recombinantIgG Fc, including a schematic of an antibody and results of modificationof an Fc fragment that is either N-tagged (N-Ald₁₃-Fc) or C-tagged(C-Ald₁₃-Fc) with a 13 mer aldehyde tag (LCTPSRAALLTGR) or N- orC-modified with a control tag (LATPSRAALLTGR).

FIG. 18 provides a schematic of site-specific labeling of an IgG Fcfragment using a 6 mer aldehyde tag, and includes results ofmodification of an Fc fragment that is either N-tagged (N-Ald₆-Fc) orC-tagged (C-Ald₆-Fc) with a 6 mer aldehyde tag (LCTPSR) or N- orC-modified with a control tag (LATPSR).

FIG. 19 provides results of identification of formylglycine(FGly)-containing peptides from N-tagged IgG Fc, and includes a set ofgraphs showing mass spectrum analysis confirming presence of FGly in atryptic fragments of N-tagged Fc fragment. (Panel a) Mass spectrum oftryptic fragments incorporating FGly. Theoretical: 429.7268 m/z;Observed: 429.7321 m/z. (Panel b) Mass spectrum of tryptic fragment ofN-tagged Fc fragment incorporating FGly after treatment with2-iodoacetamide. Theoretical: 467.2375 m/z Observed: 467.2410 m/z.

FIG. 20 provides results of identification of formylglycine(FGly)-containing peptides from C-tagged IgG Fc, and a set of graphsshowing mass spectrum analysis confirming presence of FGly in a trypticfragments of C-tagged Fc fragment. (Panel a) Mass spectrum of trypticfragments incorporating FGly. Theoretical: 508.7613 m/z; Observed:508.7755m/z. (Panel b) Mass spectrum of tryptic fragment of C-tagged Fcfragment incorporating FGly after treatment with 2-iodoacetamide.Theoretical: 546.2721 m/z; Observed: 546.2811m/z.

FIG. 21 relates to site-specific labeling of a cell surface protein, andprovides a schematic of the pDisplay™ vector used for construction ofaldehyde tagged cell surface protein (using a 13 mer aldehyde tag ofLCTPSRAALTGR) and a graph showing increased mean fluorescence forsurface protein tagged with the 13 mer (Ald13-TM) as compared to control(LATPSRAALLTGR; referred to as C→A-TM).

FIG. 22 relates to site-specific labeling a cytosolic protein,exemplified by His₆-Ald₁₃-AcGFP, and provides the results ofmodification of green fluorescent protein (GFP) fusion proteincontaining a His tag and a 13 mer aldehyde tag (referred to asHis₆-Ald₁₃-Ac-GFP or Ald-AcGFP) or a GFP fusion protein containing acontrol tag (LATPSRAALLTGR) (referred to as C→A-AcGFP.

FIG. 23 provides a schematic outlining site-specific glycosylation ofinterferon beta (IFN-Beta) using the aldehyde tag methodology.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

Before the present invention is described, it is to be understood thatthis invention is not limited to particular embodiments described, assuch may, of course, vary. It is also to be understood that theterminology used herein is for the purpose of describing particularembodiments only, and is not intended to be limiting, since the scope ofthe present invention will be limited only by the appended claims.

Where a range of values is provided, it is understood that eachintervening value, to the tenth of the unit of the lower limit unlessthe context clearly dictates otherwise, between the upper and lowerlimits of that range is also specifically disclosed. Each smaller rangebetween any stated value or intervening value in a stated range and anyother stated or intervening value in that stated range is encompassedwithin the invention. The upper and lower limits of these smaller rangesmay independently be included or excluded in the range, and each rangewhere either, neither or both limits are included in the smaller rangesis also encompassed within the invention, subject to any specificallyexcluded limit in the stated range. Where the stated range includes oneor both of the limits, ranges excluding either or both of those includedlimits are also included in the invention.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Although any methods andmaterials similar or equivalent to those described herein can be used inthe practice or testing of the present invention, some potential andexemplary methods and materials are now described. All publicationsmentioned herein are incorporated herein by reference to disclose anddescribe the methods and/or materials in connection with which thepublications are cited. It is understood that the present disclosuresupersedes any disclosure of an incorporated publication to the extentthere is a contradiction.

It must be noted that as used herein and in the appended claims, thesingular forms “a”, “an”, and “the” include plural referents unless thecontext clearly dictates otherwise. Thus, for example, reference to “analdehyde tag” includes a plurality of such tags and reference to “thepolypeptide” includes reference to one or more polypeptides andequivalents thereof known to those skilled in the art, and so forth.

It is further noted that the claims may be drafted to exclude anyelement which may be optional. As such, this statement is intended toserve as antecedent basis for use of such exclusive terminology as“solely”, “only” and the like in connection with the recitation of claimelements, or the use of a “negative” limitation.

The publications discussed herein are provided solely for theirdisclosure prior to the filing date of the present application. Nothingherein is to be construed as an admission that the present invention isnot entitled to antedate such publication by virtue of prior invention.Further, the dates of publication provided may be different from theactual publication dates which may need to be independently confirmed.

Definitons

The terms “polypeptide”, “peptide” and “protein” are usedinterchangeably herein to refer to a polymeric form of amino acids ofany length. Unless specifically indicated otherwise, “polypeptide”,“peptide” and “protein” can include genetically coded and non-codedamino acids, chemically or biochemically modified or derivatized aminoacids, and polypeptides having modified peptide backbones. The termincludes fusion proteins, including, but not limited to, fusion proteinswith a heterologous amino acid sequence, fusions with heterologous andhomologous leader sequences, proteins which contain at least oneN-terminal methionine residue (e.g., to facilitate production in arecombinant bacterial host cell); immunologically tagged proteins; andthe like.

“Target polypeptide” is used herein to refer to a polypeptide that is tobe modified by use of an aldehyde tag as described herein.

“Native amino acid sequence” or “parent amino acid sequence” are usedinterchangeably herein in the context of a target polypeptide to referto the amino acid sequence of the target polypeptide prior tomodification to include a heterologous aldehyde tag.

By “aldehyde tag” or “ald-tag” is meant an amino acid sequence thatcontains an amino acid sequence derived from a sulfatase motif which iscapable of being converted, or which has been converted, by action of aformylglycine generating enzyme (FGE) to contain a 2-formylglycineresidue (referred to herein as “FGly”). Although this is technicallyincorrect, the FGly residue generated by an FGE is often referred to inthe literature as a “formylglycine”. Stated differently, the term“aldehyde tag” is used herein to refer to an amino acid sequencecomprising an “unconverted” sulfatase motif (i.e., a sulfatase motif inwhich the cysteine or serine residues has not been converted to FGly byan FGE, but is capable of being converted) as well as to an amino acidsequence comprising a “converted” sulfatase motif (i.e., a sulfatasemotif in which the cysteine or serine resuides has been converted toFGly by action of an FGE).

By “conversion” as used in the context of action of a formylglycinegenerating enzyme (FGE) on a sulfatase motif refers to biochemicalmodification of a cysteine or serine residue in a sulfatase motif to aformylglycine (FGly) residue (e.g., Cys to FGly, or Ser to FGly).

“Modification” encompasses addition, removal, or alteration of a moiety.As used in the context of a polypeptide having a converted sulfatasemotif, “modification” is meant to refer to chemical or biochemicalmodification of an FGly residue of an aldehyde tag of a polypeptidethrough reaction of the FGly aldehyde moiety with a reactive partner. Asdiscussed above, the term “conversion” refers to a type of biochemicalmodification of a FGly residue of an aldehyde tag mediated by an FGE.

By “genetically-encodable” as used in reference to an amino acidsequence of polypeptide, peptide or protein means that the amino acidsequence is composed of amino acid residues that are capable ofproduction by transcription and translation of a nucleic acid encodingthe amino acid sequence, where transcription and/or translation mayoccur in a cell or in a cell-free in vitro transcription/translationsystem.

The term “control sequences” refers to DNA sequences to facilitateexpression of an operably linked coding sequence in a particularexpression system, e.g. mammalian cell, bacterial cell, cell-freesynthesis, etc. The control sequences that are suitable for prokaryotesystems, for example, include a promoter, optionally an operatorsequence, and a ribosome binding site. Eukaryotic cell systems mayutilize promoters, polyadenylation signals, and enhancers.

A nucleic acid is “operably linked” when it is placed into a functionalrelationship with another nucleic acid sequence. For example, DNA for apresequence or secretory leader is operably linked to DNA for apolypeptide if it is expressed as a preprotein that participates in thesecretion of the polypeptide; a promoter or enhancer is operably linkedto a coding sequence if it affects the transcription of the sequence; ora ribosome binding site is operably linked to a coding sequence if it ispositioned so as to facilitate the initiation of translation. Generally,“operably linked” means that the DNA sequences being linked arecontiguous, and, in the case of a secretory leader, contiguous and inreading frame. Linking is accomplished by ligation or throughamplification reactions. Synthetic oligonucleotide adaptors or linkersmay be used for linking sequences in accordance with conventionalpractice.

The term “expression cassette” as used herein refers to a segment ofnucleic acid, usually DNA, that can be inserted into a nucleic acid(e.g., by use of restriction sites compatible with ligation into aconstruct of interest or by homologous recombination into a construct ofinterest or into a host cell genome). In general, the nucleic acidsegment comprises a polynucleotide that encodes a polypeptide ofinterest (e.g., an aldehyde tag, which can be operably linked to apolynucleotide encoding a target polypeptide of interest), and thecassette and restriction sites are designed to facilitate insertion ofthe cassette in the proper reading frame for transcription andtranslation. Expression cassettes can also comprise elements thatfacilitate expression of a polynucleotide encoding a polypeptide ofinterest in a host cell. These elements may include, but are not limitedto: a promoter, a minimal promoter, an enhancer, a response element, aterminator sequence, a polyadenylation sequence, and the like.

As used herein the term “isolated” is meant to describe a compound ofinterest that is in an environment different from that in which thecompound naturally occurs. “Isolated” is meant to include compounds thatare within samples that are substantially enriched for the compound ofinterest and/or in which the compound of interest is partially orsubstantially purified.

As used herein, the term “substantially purified” refers to a compoundthat is removed from its natural environment and is at least 60% free,usually 75% free, and most usually 90% free from other components withwhich it is naturally associated.

The term “physiological conditions” is meant to encompass thoseconditions compatible with living cells, e.g., predominantly aqueousconditions of a temperature, pH, salinity, etc. that are compatible withliving cells.

By “heterologous” is meant that a first entity and second entity areprovided in an association that is not normally found in nature. Forexample, a protein containing a “heterologous” sulfatase motif or“heterologous” ald-tag is a protein that does not normally contain asulfatase motif at that position within its amino acid sequence (e.g.,proteins which have a single, native sulfatase motif can contain asecond sulfatase motif that is “heterologous”; further proteins whichcontain a sulfatase motif can be modified so as to reposition thesulfatase motif, rendering the re-positioned sulfatase motif“heterologous” to the protein). In some embodiments, a heterologoussulfatase motif is present in a polypeptide which contains no nativesulfatase motif.

By “reactive partner” is meant a molecule or molecular moiety thatspecifically reacts with another reactive partner to produce a reactionproduct. Exemplary reactive partners include an cysteine or serine ofsulfatase motif and a formylglycine generating enzyme (FGE), which reactto form a reaction product of a converted aldehyde tag containing a FGlyin lieu of cysteine or serine in the motif. Other exemplary reactivepartners include an aldehyde of a formylglycine (FGly) residue of aconverted aldehyde tag and a reactive partner reagent comprising amoiety of interest, which react to form a reaction product of a modifiedaldehyde tagged polypeptide having the moiety of interest conjugated tothe aldehyde tagged polypeptide at the FGly residue.

“N-terminus” refers to the terminal amino acid residue of a polypeptidehaving a free amine group, which amine group in non-N-terminus aminoacid residues normally forms part of the covalent backbone of thepolypeptide.

“C-terminus” refers to the terminal amino acid residue of a polypeptidehaving a free carboxyl group, which carboxyl group in non-C-terminusamino acid residues normally forms part of the covalent backbone of thepolypeptide.

By “N-terminal” is meant the region of a polypeptide that is closer tothe N-terminus than to the C-terminus.

By “C-terminal” is meant the region of a polypeptide that is closer tothe C-terminus than to the N-terminus.

By “internal site” as used in referenced to a polypeptide or an aminoacid sequence of a polypeptide means a region of the polypeptide that isnot at the N-terminus or at the C-terminus, and includes both N-temrinaland C-terminal regions of the polypeptide.

Introduction

The present invention exploits a naturally-occuring,genetically-encodable sulfatase motif for use as a peptide tag, referredto herein as an “aldehyde tag” or “ald-tag”, to direct site-specificmodification of a polypeptide. The sulfatase motif of the aldehyde tag,which is based on a motif found in active sites of sulfatases, containsa serine or cysteine residue that is capable of being converted(oxidized) to a formylglycine (FGly) by action of a formylglycinegenerating enzyme (FGE) either in vivo (e.g., at the time of translationof an aldehyde tag-containing protein in a cell) or in vitro (e.g., bycontacting an aldehyde tag-containing protein with an FGE in a cell-freesystem). The aldehyde moiety of the resulting FGly residue can be usedas a “chemical handle” to facilitate site-specific chemical modificationof the protein.

FIG. 1A is a schematic showing an exemplary methods and compositions ofthe invention. In this example, an exemplary sulfatase motif (“LCTPSR”)is positioned in a construct containing a nucleic acid encoding aprotein of interest. In this example, the sulfatase motif is positionedat the N-terminus of the encoded protein following expression; however,as described in more detail below, sulfatase motifs can be inserted atone or more desired sites of the polypeptide (e.g., to provide for themotif at the N-terminus, C-terminus and/or internal site of the encodedpolypeptide). The sulfatase motif exemplified in FIG. 1A is within agenus of sulfatase motifs as described below in more detail. FIG. 1B isa schematic of a sequence alignment of the sulfatase motif from avariety of sulfatases found in diverse organisms. The consensus sequencecontains the sequence of the two aldehyde tags used in this study.Conserved residues are highlighted.

Upon expression in a cell and/or exposure to the appropriate enzyme(e.g., AtsB-type or SUMF1-type FGE), the encoded cysteine of thesulfatase motif is converted to a formylglycine (FGly). The aldehyde ofthe FGly residue can be used as a “chemical handle” for a variety ofapplications, e.g., for covalent ligation with a moiety of interest orfor applications such as protein immobilization. In FIG. 1A, theexemplary moiety is a detectable label which is attached to the modifiedcysteine residue of the sulfatase moiety.

Both placement of the aldehyde tag within the target protein to bemodified and aldehyde tag-mediated modification as disclosed herein aregeneralizable with respect to a wide variety of proteins. The ability ofFGE to facilitate conversion of the sulfatase motif to generate a FGlyresidue is independent of the position of the motif within the protein.Because FGE can convert the cysteine/serine of the sulfatase motif inmanner that is both sequence context-independent and structural-contextindependent, aldehyde tags can be positioned at any desired site withina target polypeptide to be modified, with the proviso that the sulfatasemotif is accessible to the FGE at the time of enzymatic conversion.Furthermore, the unique reactivity of the aldehyde allows forbioorthongonal and chemoselective modification of recombinant proteins,thus providing a site-specific means for chemical modification ofproteins that can be conducted under physiological conditions and in ahighly selective manner

As will be appreciated from the present disclosure, the applications ofaldehyde tags are numerous and can provide a number of advantages. Forexample, the aldehyde tag is smaller than most if not all conventionalpeptide tags that allow for covalent modification of proteins, therebyrequiring minimal changes to the amino acid sequence of a targetpolypeptide. Second, the aldehyde tag takes advantage ofwell-characterized secondary labeling chemistries. Third, the aldehydetag demonstrates reversibility, and through selection of reactivepartners that provide for moiety conjugation through covalent bonds ofdiffering stability, allows for sequential modification and replacementof a moiety attached at an aldehyde tag. Further, because the aldehydetag is formed using biosynthetic machinery already present in mostcellular systems, and is independent of the nature of the target orplacement within the parent amino acid sequence, the aldehyde tag can beused to facilitate modification of a large number of polypeptides usingreadily available expression system.

The aldehyde moiety of a converted aldehyde tag can be used for avariety of applications including, but not limited to, visualizationusing fluorescence or epitope labeling (e.g., electron microscopy usinggold particles equipped with aldehyde reactive groups), proteinimmobilization (e.g., protein microarray production), protein dynamicsand localization studies and applications, and conjugation of proteinswith a moiety of interest (e.g., moieties that improve a parentprotein's therapeutic index (e.g., PEG), targeting moieties (e.g., toenhance bioavailability to a site of action), and biologically activemoieties (e.g., a therapeutic moiety).

Of particular interest is the use of aldehyde tags to facilitatesite-specific attachment of a water-soluble polymer, such as PEG.Despite advances in protein conjugation chemistries, controlled,site-specific modification of proteins remains a challenge. Manyconventional PEGylation methods attach PEG moieties through reactionwith, for example, a lysine or cysteine as a target residue. Due to thepresence of multiple target residues in a protein, such conventionalsystems can result in PEGylation at multiple sites, creating acollection of discrete protein-PEG conjugates with differentpharmacokinetics. In contrast, use of an FGly residue of an aldehyde tagas a target residue provides a unique site for covalent polymerattachment, and thus increases both specificity and homogeneity of theresulting modified product. These and other features and advantages willbe readily apparent to the ordinarily skilled artisan upon reading thepresent disclosure.

The methods and compositions for practice of the invention will now bedescribed in more detail.

Aldehyde Tags

In general, an aldehyde tag can be based on any amino acid sequencederived from a sulfatase motif (also referred to as a “sulfatasedomain”) which is capable of being converted by action of aformylglycine generating enzyme (FGE) to contain a formylglycine (FGly).Action of FGE is directed in a sequence-specific manner in that the FGEacts at a sulfatase motif, but this sulfatase motif can be positionedwithin any region of a target polypeptide. Thus, FGE-mediated conversionof a sulfatase motif is site-specific (i.e., in that FGE acts at theamino acid sequence of a sulfatase motif) but the ability of FGE to actupon the sulfatase motif is sequence context-independent (i.e., theability of the FGE to convert a cysteine/serine of a sulfatase motif isindependent of the sequence context in which the sulfatase motif ispresented in the target polypeptide).

Exemplary Aldehyde Tags

A minimal sulfatase motif of an aldehyde tag is usually about 5 or 6amino acid residues in length, usually no more than 6 amino acidresidues in length. In general, it is normally desirable to minimize theextent of modification of the native amino acid sequence of the targetpolypeptide, so as to minimize the number of amino acid residues thatare inserted, deleted, substituted (replaced), or added (e.g., to the N-or C-terminus) Minimizing the extent of amino acid sequence modificationof the target polypeptide is usually desirable so as to minimize theimpact such modifications may have upon target polypeptide functionand/or structure. Thus, aldehyde tags of particular interest includethose that require modification (insertion, addition, deletion,substitution/replacement) of less than 16, 15, 14, 13, 12, 11, 10, 9, 8,or 7 amino acid residues of the amino acid sequence of the targetpolypeptide.

It should be noted that while aldehyde tags of particular interest arethose based on a minimal sulfatase motif, it will be readily appreciatedthat longer aldehyde tags are both contemplated and encompassed by thepresent disclosure and can find use in the compositions and methods ofthe invention. Aldehyde tags can thus comprise a minimal sulfatase motifof 5 or 6 residues, or can be longer and comprise a minimsal sulfatasemotif which can be flanked at the N- and/or C-terminal sides of themotif by additional amino acid residues. Aldehyde tags of, for example,5 or 6 amino acid residues are contemplated, as well as longer aminoacid sequences of more than 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20 or more amino acid residues.

In general, sulfatase motifs useful in aldehyde tags as described hereinare of the formula:

X₁Z₁X₂Z₂X₃R   (I)

where

Z₁ is cysteine or serine (which can also be represented by (C/S));

Z₂ is either a proline or alanine residue (which can also be representedby (P/A));

X₁ is present or absent and, when present, can be any amino acid, thoughusually an aliphatic amino acid, a sulfur-containing amino acid, or apolar, uncharged amino acid, (i.e., other than a aromatic amino acid ora charged amino acid), usually L, M, V, S or T, more usually L, M, S orV, with the proviso that when the sulfatase motif is at the N-terminusof the target polypeptide, X₁ is present; and

X₂ and X₃ independently can be any amino acid, though usually analiphatic amino acid, a polar, uncharged amino acid, or a sulfurcontaining amino acid (i.e., other than a aromatic amino acid or acharged amino acid), usually S, T, A, V, G or C, more usually S, T, A, Vor G.

It should be noted that, following action of an FGE on the sulfatasemotif, Z₁ is oxidized to generate a formylglycine (FGly) residue.Furthermore, following both FGE-mediated conversion and reaction with areactive partner comprising a moiety of interest, FGly position at Z₁₀in the formula above is covalently bound to the moiety of interest(e.g., detectable label, water soluble polymer, etc).

Where the aldehyde tag is present at a location other than theN-terminus of the target polypeptide, X₁ of the formula above can beprovided by an amino acid residue of the native amino acid sequence ofthe target polypeptide. Therefore, in some embodiments, and when presentat a location other than the N-terminus of a target polypeptide,sulfatase motifs are of the formula:

(C/S)X₁(P/A)X₂R   (II)

where X₁ and X₂ independently can be any amino acid, though usually analiphatic amino acid, a polar, uncharged amino acid, or asulfur-containing amino acid (i.e., other than an aromatic amino acid ora charged amino acid), usually S, T, A, V, or C, more usually S, T, A,or V.

As noted above, the sulfatase motif can contain additional residues atone or both of the N- and C-terminus of the sequence, e.g., such thatthe aldehyde tag includes both a sulfatase motif and an “auxiliarymotif”. In one embodiment, the sulfatase motif includes an auxiliarymotif at the C-terminus (i.e., following the arginine residue in theformula above) 1, 2, 3, 4, 5, 6, or all 7 of the contiguous residues ofan amino acid sequence of AALLTGR, SQLLTGR, AAFMTGR, AAFLTGR, SAFLTGR,ASILTGK, VSFLTGR, ASLLTGL, ASILITG, VSFLTGR, SAIMTGR, SAIVTGR, TNLWRG,TNLWRGQ, TNLCAAS, VSLWTGK, SMLLTG, SMLLTGN, SMLLTGT, ASFMAGQ, orASLLTGL, (see, e.g., Dierks et al. (1999) EMBO J 18(8): 2084-2091), orof GSLFTGR. However, as set out in the Examples below, the presentinventors have found that such additional C-terminal amino acid residuesare not required for FGE-mediated conversion of the sulfatase motif ofthe aldehyde tag, and thus are only optional and may be specificallyexcluded from the aldehyde tags described herein. In some embodimentsthe aldehyde tag does not contain an amino acid sequence CGPSR(M/A)S orCGPSR(M/A), which may be present as a native amino acid sequence inphosphonate monoester hydrolases.

The sulfatase motif of the aldehyde tag is generally selected so as tobe capable of conversion by a selected FGE, e.g., an FGE present in ahost cell in which the aldehyde tagged polypeptide is expressed or anFGE which is to be contacted with the aldehyde tagged polypeptide in acell-free in vitro method.

Selection of aldehyde tags and an FGE that provide for suitable reactivepartners to provide for generation of an FGly in the aldehyde taggedtarget polypeptide can be readily accomplished in light of informationavailable in the art. In general, sulfatase motifs susceptible toconversion by a eukaryotic FGE contain a cysteine and a proline (i.e., acysteine and proline at Z₁ and Z₂, respectively, in Formula I above(e.g., X₁CX₂PX₃R); CX₁PX₂R in Formula II above) and are modified by the“SUMF1-type” FGE (Cosma et al. Cell 2003, 113, (4), 445-56; Dierks etal. Cell 2003, 113, (4), 435-44). Sulfatase motifs susceptible toconversion by a prokaryotic FGE contain either a cysteine or a serine,and a proline in the sulfatase motif (i.e., a cysteine or serine at Z₁,and a proline at Z₂, respectively, in Formula I above (e.g.,X₁(C/S)X₂PX₃R); (C/S)X₁PX₂R in Formula II above) are modified either bythe “SUMF1-type” FGE or the “AtsB-type” FGE, respectively (Szameit etal. J Biol Chem 1999, 274, (22), 15375-81). Other sulfatase motifssusceptible to conversion by a prokaryotic FGE contain either a cysteineor a serine, and either a proline or an alanine in the sulfatase motif(i.e., a cysteine or serine at Z₁, and a proline or alanine at Z₂,respectively, in Formula I above (e.g., X₁CX₂PX₃R; X₁SX₂PX₂R; X₁CX₂AX₃R;X₁SX₂AX₃R); CX₁PX₂R; SX₁PX₂R; CX₁AX₂R; SX₁AX₂R in Formula II above), andare susceptible to modification by, for example, can be modified by anFGE of a Firmicutes (e.g., Clostridium perfringens) (see Berteau et al.J. Biol. Chem. 2006; 281:22464-22470).

Therefore, for example, where the FGE is a eukaryotic FGE (e.g., amammalian FGE, including a human FGE), the sulfatase motif is usually ofthe formula:

X₁CX₂PX₃R

where

X₁ may be present or absent and, when present, can be any amino acid,though usually an aliphatic amino acid, a sulfur-containing amino acid,or a polar, uncharged amino acid, (i.e., other than a aromatic aminoacid or a charged amino acid), usually L, M, S or V, with the provisothat when the sulfatase motif is at the N-terminus of the targetpolypeptide, X₁ is present; and

X₂ and X₃ independently can be any amino acid, though usually analiphatic amino acid, a sulfur-containing amino acid, or a polar,uncharged amino acid, (i.e., other than a aromatic amino acid or acharged amino acid), usually S, T, A, V, G, or C, more usually S, T, A,V or G.

Specific examples of sulfatase motifs include LCTPSR, MCTPSR, VCTPSR,LCSPSR, LCAPSR LCVPSR, and LCGPSR. Other specific sulfatase motifs arereadily apparent from the disclosure provided herein.

As described in more detail below, a converted aldehyde taggedpolypeptide is reacted with a reactive partner containing a moiety ofinterest to provide for conjugation of the moiety of interest to theFGly residue of the converted aldehyde tagged polypeptide, andproduction of a modified polypeptide. Modified polypeptides having amodified aldehyde tag are generally described by comprising a modifiedsulfatase motif of the formula:

X₁(FGly′)X₂Z₂X₃R   (I)

where

FGly′ is the formylglycine residue having a covalently attached moiety;

Z₂ is either a proline or alanine residue (which can also be representedby (P/A));

X₁ may be present or absent and, when present, can be any amino acid,though usually an aliphatic amino acid, a sulfur-containing amino acid,or a polar, uncharged amino acid, (i.e., other than a aromatic aminoacid or a charged amino acid), usually L, M, V, S or T, more usually L,M or V, with the proviso that when the sulfatase motif is at theN-terminus of the target polypeptide, X₁ is present; and

X₂ and X₃ independently can be any amino acid, though usually analiphatic amino acid, a sulfur-containing amino acid, or a polar,uncharged amino acid, (i.e., other than a aromatic amino acid or acharged amino acid), usually S, T, A, V, G or C, more usually S, T, A, Vor G.

Specific examples of converted sulfatase motifs include L(FGly)TPSR,M(FGly)TPSR, V(FGly)TPSR, L(FGly)SPSR, L(FGly)APSR L(FGly)VPSR, andL(FGly)GPSR.

As described in more detail below, the moiety of interest can be any ofa variety of moieties such as a water-soluble polymer, a detectablelabel, a drug, or a moiety for immobilization of the polypeptide in amembrane or on a surface. As is evident from the above discussion ofaldehyde tagged polypeptides, the modified sulfatase motif of themodified polypeptide can be positioned at any desired site of thepolypeptide. Thus, the present disclosure provides, for example, amodified polypeptide having a modified sulfatase motif positioned at asite of post-translational modification of a parent of the modifiedpolypeptide (i.e., if the target polypeptide is modified to provide analdehyde tag at a site of post-translational modification, thelater-produced modified polypeptide will contain a moiety at a positioncorresponding to this site of post-translational modification in theparent polypeptide). For example, then, a modified polypeptide can beproduced so as to have a covalently bound, water-soluble polymer at asite corresponding to a site at which glycosylation would normally occurin the parent target polypeptide. Thus, for example, a PEGylatedpolypeptide can be produced having the PEG moiety positioned at the sameor nearly the same location as sugar residues would be positioned in thenaturally-occurring parent polypeptide. Similarly, where the parenttarget polypeptide is engineered to include one or more non-native sitesof post-translational modification, the modified polypeptie can containcovalently attached water-soluble polymers at one or more sites of themodified polypeptide corresponding to these non-native sites ofpost-translational modification in the parent polypeptide.

Modification of a Target Polypeptide to Include an Aldehyde Tag

Aldehyde tags can be positioned at any location within a targetpolypeptide at which it is desired to provide for conversion and/ormodification of the target polypeptide, with the proviso that the siteof the aldehyde tag is accessible for conversion by an FGE andsubsequent modification at the FGly, or can be rendered accessible(e.g., by denaturing the protein). Target polypeptides can be modifiedto include one or more aldehyde tags The number of aldehyde tags thatcan be present in a target polypeptide will vary with the targetpolypeptide selected, and may include 1, 2, 3, 4, 5, or more aldehydetags.

In some embodiments it is desirable to position the aldehyde tag(s) inthe target polypeptide taking into account its structure when folded(e.g., in a cell-free environment, usually a cell-free physiologicalenvironment) and/or presented in or on a cell membrane (e.g., forcell-membrane associated polypeptides, such as transmembrane proteins).For example, an aldehyde tag can be positioned at a solvent accessiblesite in the folded target polypeptide. The solvent accessible aldehydetag in a folded unconverted aldehyde tagged polypeptide is thusaccessible to an FGE for conversion of the serine or cysteine to anFGly. Likewise, a solvent accessible aldehyde tag of a convertedaldehyde tagged polypeptide is accessible to a reactive partner reagentfor conjugation to a moiety of interest to provide a modified aldehydetagged polypeptide. Where an aldehyde tag is positioned at a solventaccessible site, in vitro FGE-mediated conversion and modification witha moiety by reaction with a reactive partner can be performed withoutthe need to denature the protein. Solvent accessible sites can alsoinclude target polypeptide regions that are exposed at an extracellularor intracellular cell surface when expressed in a host cell (e.g., otherthan a transmembrane region of the target polypeptide).

Accordingly, or more aldehyde tags can be provided at sitesindependently selected from, for example, a solvent accessibleN-terminus, a solvent accessible N-terminal region, a solvent accessibleC-terminus, a solvent accessible C-terminal region, and/or a loopstructure (e.g., an extracellular loop structure and/or an intracellularloop structure). In some embodiments, the aldehyde tag is positioned ata site other than the C-terminus of the polypeptide. In otherembodiments, the polypeptide in which the aldehyde tag is positioned isa full-length polypeptide.

In other embodiments, an aldehyde tag site is positioned at a site whichis post-translationally modified in the native target polypeptide. Forexample, an aldehyde tag can be introduced at a site of glycosylation(e.g., N-glycosylation, O-glycosylation), phosphorylation, sulftation,ubiquitination, acylation, methylation, prenylation, hydroxylation,carboxylation, and the like in the native target polypeptide. Consensussequences of a variety of post-translationally modified sites, andmethods for identification of a post-translationally modified site in apolypeptide, are well known in the art. It is understood that the siteof post-translational modification can be naturally-occurring or such asite of a polypeptide that has been engineered (e.g., throughrecombinant techniques) to include a post-translational modificationsite that is non-native to the polypeptide (e.g., as in a glycosylationsite of a hyperglycosylated variant of EPO). In the latter embodiment,polypeptides that have a non-native post-translational modification siteand which have been demonstrated to exhibit a biological activity ofinterest are of particular interest.

The disclosure also provides herein methods for identifying suitablesites for modification of a target polypeptide to include an aldehydetag. For example, one or more aldehyde tagged-target polypeptidesconstructs can be produced, and the constructs expressed in a cellexpressing an FGE, or exposed to FGE following isolation from the cell(as described in more detail below). The aldehyde tagged-polypeptide canthen be contacted with a reactive partner that, if the aldehyde tag isaccessible, provides for attachment of a detectable moiety to the FGlyof the aldehyde tag. The presence or absence of the detectable moiety isthen determined. If the detectable moiety is detected, then positioningof the aldehyde tag in the polypeptide was successful. In this manner, alibrary of constructs having an aldehyde tag positioned at differentsites in the coding sequence of the target polypeptide can be producedand screened to facilitate identification of an optimal position of analdehyde tag. In addition or alternatively, the aldehydetagged-polypeptide can be tested for a biological activity normallyassociated with the target polypeptide, and/or the structure of thealdehyde tagged-polypeptide assessed (e.g., to assess whether an epitopenormally present on an extracellular cell surface in the native targetpolypeptide is also present in the aldehyde tagged-polypeptide).

An aldehyde tag can be provided in a target polypeptide by insertion(e.g., so as to provide a 5 or 6 amino acid residue insertion within thenative amino acid sequence) or by addition (e.g., at an N- or C-terminusof the target polypeptide). An aldehyde tag can also be provided bycomplete or partial substitution of native amino acid residues with thecontiguous amino acid sequence of an aldehyde tag. For example, aheterologous aldehyde tag of 5 (or 6) amino acid residues can beprovided in a target polypeptide by replacing 1, 2, 3, 4, or 5 (or 1, 2,3, 4, 5, or 6) amino acid residues of the native amino acid sequencewith the corresponding amino acid residues of the aldehyde tag. Althoughit generally may be of less interest in many applications, targetpolypeptides having more than one aldehyde tag can be modified so as toprovide for attachment of the same moiety or of different moieties atthe FGly of the tag.

Modification of a target polypeptide to include one or more aldehydetags can be accomplished using recombinant molecular genetic techniques,so as produce nucleic acid encoding the desired aldehyde tagged targetpolypeptide. Such methods are well known in the art, and include cloningmethods, site-specific mutation methods, and the like (see, e.g.,Sambrook et al., In “Molecular Cloning: A Laboratory Manual” (ColdSpring Harbor Laboratory Press 1989); “Current Protocols in MolecularBiology” (eds., Ausubel et al.; Greene Publishing Associates, Inc., andJohn Wiley & Sons, Inc. 1990 and supplements). Alternatively, analdehyde tag can be added using non-recombinant techniques, e.g., usingnative chemical ligation or pseudo-native chemical ligation, e.g., toadd an aldehyde tag to a C-terminus of the target polypeptide (see,e.g., U.S. Pat. Nos. 6,184,344; 6,307,018; 6,451,543; 6,570,040; US2006/0173159; US 2006/0149039). See also Rush et al. (Jan. 5, 2006) OrgLett. 8(1):131-4.

Target Polypeptides

Any of a wide variety of polypeptides can be modified to include analdehyde tag to facilitate modification of the polypeptide. Polypeptidessuitable for aldehyde tag-based modification include both proteinshaving a naturally-occurring amino acid sequence, a native amino acidsequence having an N-terminal methionine, fragments ofnaturally-occurring polypeptides, and non-naturally occurringpolypeptides and fragments thereof. In some embodiments, the targetpolypeptide is polypeptide other than a sulfatase or fragment thereof,other than a reporter protein, or other than preprolactin or prolactin.

The following are exemplary classes and types of polypeptides which areof interest for modification using the aldehyde tag-based methodsdescribed herein.

Therapeutic Polypeptides

In one embodiment, the aldehyde tag-based methods of proteinmodification are applied to modification of polypeptides that mayprovide for a therapeutic benefit, particularly those polypeptides forwhich attachment to a moiety can provide for one or more of, forexample, an increase in serum half-life, a decrease in an adverse immuneresponse, additional or alternate biological activity or functionality,and the like. or other benefit or reduction of an adverse side effect.Where the therapeutic polypeptide is an antigen for a vaccine,modification can provide for an enhanced immunogenicity of thepolypeptide.

Examples of classes of therapeutic proteins include those that arecytokines, chemokines, growth factors, hormones, antibodies, andantigens. Further examples include erythropoietin (EPO, e.g., nativeEPO, synthetic EPO (see, e.g., US 2003/0191291), human growth hormone(hGH), bovine growth hormone (bGH), follicle stimulating hormone (FSH),interferon (e.g., IFN-gamma, IFN-beta, IFN-alpha, IFN-omega, consensusinterferon, and the like), insulin, insulin-like growth factor (e.g.,IGF-I, IGF-II), blood factors (e.g., Factor VIII, Factor IX, Factor X,tissue plasminogen activator (TPA), and the like), colony stimulatingfactors (e.g., granulocyte-CSF (G-CSF), macrophage-CSF (M-CSF),granulocyte-macrophage-CSF (GM-CSF), and the like), transforming growthfactors (e.g., TGF-beta, TGF-alpha), interleukins (e.g., IL-1, IL-2,IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-12, and the like), epidermalgrowth factor (EGF), platelet-derived growth factor (PDGF), fibroblastgrowth factors (FGFs, e.g., aFGF, bFGF), glial cell line-derived growthfactor (GDNF), nerve growth factor (NGF), RANTES, and the like.

Further examples include antibodies, e.g., polyclonal antibodies,monoclonal antibodies, humanized antibodies, antigen-binding fragments(e.g., F(ab)′, Fab, Fv), single chain antibodies, and the like. Ofparticular interest are antibodies that specifically bind to a tumorantigen, an immune cell antigen (e.g., CD4, CD8, and the like), anantigen of a microorganism, particularly a pathogenic microorganism(e.g., a bacterial, viral, fungal, or parasitic antigen), and the like.

The methods and compositions described herein can be applied to providefor a moiety (e.g., a water-soluble polymer) at a native or engineeredsite of glycosylation, such as found in hyperglycosylated forms of aprotein therapeutic, such as, for example: an interferon (e.g., IFN-γ,IFN-α, IFN-β; IFN-ω; IFN-τ); an insulin (e.g., Novolin, Humulin,Humalog, Lantus, Ultralente, etc.); an erythropoietin (e.g., PROCRIT®,EPREX®, or EPOGEN® (epoetin-α); ARANESP® (darbepoietin-α); NEORECORMON®,EPOGIN® (epoetin-β); and the like); an antibody (e.g., a monoclonalantibody) (e.g., RITUXAN® (rituximab); REMICADE® (infliximab);HERCEPTIN® (trastuzumab); HUMIRA™ (adalimumab); XOLAIR® (omalizumab);BEXXAR® (tositumomab); RAPTIVA™ (efalizumab); ERBITUX™ (cetuximab); andthe like), including an antigen-binding fragment of a monoclonalantibody; a blood factor (e.g., ACTIVASE® (alteplase) tissue plasminogenactivator; NOVOSEVEN® (recombinant human factor VIIa); Factor VIIa;Factor VIII (e.g., KOGENATE®); Factor IX; β-globin; hemoglobin; and thelike); a colony stimulating factor (e.g., NEUPOGEN® (filgrastim; G-CSF);Neulasta (pegfilgrastim); granulocyte colony stimulating factor (G-CSF),granulocyte-monocyte colony stimulating factor, macrophage colonystimulating factor, megakaryocyte colony stimulating factor; and thelike); a growth hormone (e.g., a somatotropin, e.g., GENOTROPIN®,NUTROPIN®, NORDITROPIN®, SAIZEN®, SEROSTIM®, HUMATROPE®, etc.; a humangrowth hormone; and the like); an interleukin (e.g., IL-1; IL-2,including, e.g., Proleukin®; IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9;etc.); a growth factor (e.g., REGRANEX® (beclapermin; PDGF); FIBLAST®(trafermin; bFGF); STEMGEN® (ancestim; stem cell factor); keratinocytegrowth factor; an acidic fibroblast growth factor, a stem cell factor, abasic fibroblast growth factor, a hepatocyte growth factor; and thelike); a soluble receptor (e.g., a TNF-α-binding soluble receptor suchas ENBREL® (etanercept); a soluble VEGF receptor; a soluble interleukinreceptor; a soluble γ/δ T cell receptor; and the like); an enzyme (e.g.,a-glucosidase; CERAZYME® (imiglucarase; β-glucocerebrosidase, CEREDASE®(alglucerase;); an enzyme activator (e.g., tissue plasminogenactivator); a chemokine (e.g., IP-10; Mig; Groα/IL-8, RANTES; MIP-1α;MIP-1β; MCP-1; PF-4; and the like); an angiogenic agent (e.g., vascularendothelial growth factor (VEGF) ; an anti-angiogenic agent (e.g., asoluble VEGF receptor); a protein vaccine; a neuroactive peptide such asbradykinin, cholecystokinin, gastin, secretin, oxytocin,gonadotropin-releasing hormone, beta-endorphin, enkephalin, substance P,somatostatin, galanin, growth hormone-releasing hormone, bombesin,warfarin, dynorphin, neurotensin, motilin, thyrotropin, neuropeptide Y,luteinizing hormone, calcitonin, insulin, glucagon, vasopressin,angiotensin II, thyrotropin-releasing hormone, vasoactive intestinalpeptide, a sleep peptide, etc.; other proteins such as a thrombolyticagent, an atrial natriuretic peptide, bone morphogenic protein,thrombopoietin, relaxin, glial fibrillary acidic protein, folliclestimulating hormone, a human alpha-1 antitrypsin, a leukemia inhibitoryfactor, a transforming growth factor, a tissue factor, an insulin-likegrowth factor, a luteinizing hormone, a follicle stimulating hormone, amacrophage activating factor, tumor necrosis factor, a neutrophilchemotactic factor, a nerve growth factor, a tissue inhibitor ofmetalloproteinases; a vasoactive intestinal peptide, angiogenin,angiotropin, fibrin; hirudin; a leukemia inhibitory factor; an IL-1receptor antagonist (e.g., Kineret® (anakinra)); and the like. It willbe readily appreciated that native forms of the above therapeuticproteins are also of interest as target polypeptides in the presentinvention.

The biological activity of a modified target polypeptide can be assayedaccording to methods known in the art. Modified aldehydetagged-polypeptides that retain at least one desired pharmacologicactivity of the corresponding parent protein are of interest. Examplesof useful assays for particular therapeutic proteins include, but arenot limited to, GMCSF (Eaves, A. C. and Eaves C. J., Erythropoiesis inculture. In: McCullock E A (edt) Cell culture techniques—Clinics inhematology. W B Saunders, Eastbourne, pp 371-91 (1984); Metcalf, D.,International Journal of Cell Cloning 10: 116-25 (1992); Testa, N. G.,et al., Assays for hematopoietic growth factors. In: Balkwill F R (edt)Cytokines A practical Approach, pp 229-44; IRL Press Oxford 1991) EPO(bioassay: Kitamura et al., J. Cell. Physiol. 140 p323 (1989)); Hirudin(platelet aggregation assay: Blood Coagul Fibrinolysis 7(2):259-61(1996)); IFNα (anti-viral assay: Rubinstein et al., J. Virol.37(2):755-8 (1981); anti-proliferative assay: Gao Y, et al Mol CellBiol. 19(11):7305-13 (1999); and bioassay: Czarniecki et al., J. Virol.49 p490 (1984)); GCSF (bioassay: Shirafuji et al., Exp. Hematol. 17 p116(1989); proliferation of murine NFS-60 cells (Weinstein et al, Proc NatlAcad Sci 83:5010-4 (1986)); insulin (³H-glucose uptake assay: Steppan etal., Nature 409(6818):307-12 (2001)); hGH (Ba/F3-hGHR proliferationassay: J Clin Endocrinol Metab 85(11):4274-9 (2000); Internationalstandard for growth hormone: Horm Res, 51 Suppl 1:7-12 (1999)); factor X(factor X activity assay: Van Wijk et al. Thromb Res 22:681-686 (1981));factor VII (coagulation assay using prothrombin clotting time: Belaaouajet al., J. Biol. Chem. 275:27123-8(2000); Diaz-Collier et al., ThrombHaemost 71:339-46 (1994)).

Immunogenic Compositions

The aldehyde tag-based technology disclosed herein also findsapplication in production of components of immunogenic compositions(e.g., therapeutic vaccines). For example, an aldehyde tag can be usedto facilitate attachment of moieties that increase serum half-life of apolypeptide antigen, that increase immunogenicity of the polypeptide, orthat link a non-amino acid antigen to a polypeptide carrier. In thisregard, aldehyde tags can be used to facilitate modification ofmicrobial antigens (e.g., a bacterial, viral, fungal, or parasiticantigen), tumor antigens, and other antigens which are of interest foradministration to a subject to elicit an immune response in the subject.Also of interest is modification of antigens that are useful ineliciting antibodies which can be useful as research tools.

Further exemplary polypeptides of interest for modification usingaldehyde tag(s) include those that are of interest for detection orfunctional monitoring in an assay (e.g., as a research tool, in a drugscreening assay, and the like). Exemplary polypeptides of this typeinclude receptors (e.g., G-protein coupled receptors (GPCRs, includingorphan GPCRs)), receptor ligands (including naturally-occurring andsynthetic), protein channels (e.g., ion channels (e.g., potassiumchannels, calcium channels, sodium channels, and the like), and otherpolypeptides. In one embodiment, modification of cell surface-associatedpolypeptides, such as transmembrane polypeptides) is of particularinterest, particularly where such modification is accomplished while thepolypeptide is present in a membrane. Methods for modification of analdehyde tagged-polypeptide under physiological conditions is describedfurther below.

Formylglycine Generating Enzymes (FGEs)

The enzyme that oxidizes cysteine or serine in a sulfatase motif to FGlyis referred to herein as a formylglycine generating enzyme (FGE). Asdiscussed above, “FGE” is used herein to refer to FGly-generatingenzymes that mediate conversion of a cysteine (C) of a sulfatase motifto FGly as well as FGly-generating enzymes that mediate conversion ofserine (S) of a sulfatase motif to FGly. It should be noted that ingeneral, the literature refers to FGly-generating enzymes that convert aC to FGly in a sulfatase motif as FGEs, and refers to enzymes thatconvert S to FGly in a sulfatase motif as Ats-B-like. However, forpurposes of the present disclosure “FGE” is used generically to refer toboth types of FGly-generating enzymes, with the understanding that anappropriate FGE will be selected according to the target reactivepartner containing the appropriate sulfatase motif (i.e., C-containingor S-containing).

As evidenced by the ubiquitous presence of sulfatases having an FGly atthe active site, FGEs are found in a wide variety of cell types,including both eukaryotes and prokaryotes. There are at least two formsof FGEs. Eukaryotic sulfatases contain a cysteine in their sulfatasemotif and are modified by the “SUMF1-type” FGE (Cosma et al. Cell 2003,113, (4), 445-56; Dierks et al. Cell 2003, 113, (4), 435-44). theFGly-generating enzyme (FGE) is encoded by the SUMF1 gene Prokaryoticsulfatases can contain either a cysteine or a serine in their sulfatasemotif and are modified either by the “SUMF1-type” FGE or the “AtsB-type”FGE, respectively (Szameit et al. J Biol Chem 1999, 274, (22),15375-81). In eukaryotes, it is believed that this modification happensco-translationally or shortly after translation in the endoplasmicreticulum (ER) (Dierks et al. Proc Natl Acad Sci USA 1997,94(22):11963-8). Without being held to theory, in prokaryotes it isthought that SUMF1-type FGE functions in the cytosol and AtsB-type FGEfunctions near or at the cell membrane. A SUMF2 FGE has also beendescribed in deuterostomia, including vertebrates and echinodermata(see, e.g., Pepe et al. (2003) Cell 113, 445-456, Dierks et al. (2003)Cell 113, 435-444; Cosma et al. (2004) Hum. Mutat. 23, 576-581).

In general, the FGE used to facilitate conversion of cysteine or serineto FGly in a sulfatase motif of an aldehyde tag of a target polypeptideis selected according to the sulfatase motif present in the aldehydetag. The FGE can be native to the host cell in which the aldehyde taggedpolypeptide is expressed, or the host cell can be genetically modifiedto express an appropriate FGE. In some embodiments it may be desired touse a sulfatase motif compatible with a human FGE (e.g., the SUMF1-typeFGE, see, e.g., Cosma et al. Cell 113, 445-56 (2003); Dierks et al. Cell113, 435-44 (2003)), and express the aldehyde tagged protein in a humancell that expresses the FGE or in a host cell, usually a mammalian cell,genetically modified to express a human FGE.

In general, an FGE for use in the methods disclosed herein can beobtained from naturally occurring sources or synthetically produced. Forexample, an appropriate FGE can be derived from biological sources whichnaturally produce an FGE or which are genetically modified to express arecombinant gene encoding an FGE. Nucleic acids encoding a number ofFGEs are known in the art and readily available (see, e.g., Preusser etal. 2005 J. Biol. Chem. 280(15):14900-10 (Epub 2005 Jan 18); Fang et al.2004 J Biol Chem. 79(15):14570-8 (Epub 2004 Jan. 28); Landgrebe et al.Gene. 2003 Oct 16;316:47-56; Dierks et al. 1998 FEBS Lett. 423(1):61-5;Dierks et al. Cell. 2003 May 16; 113(4):435-44; Cosma et al. (2003 May16) Cell 113(4):445-56; Baenziger (2003 May 16) Cell 113(4):421-2(review); Dierks et al. Cell. 2005 May 20;121(4):541-52; Roeser et al.(2006 Jan. 3)Proc Natl Acad Sci USA 103(1):81-6; Sardiello et al. (2005Nov 1) Hum Mol Genet. 14(21):3203-17; WO 2004/072275; and GenBankAccession No. NM_182760. Accordingly, the disclosure here provides forrecombinant host cells genetically modified to express an FGE that iscompatible for use with an aldehyde tag of a tagged target polypeptide.

In one embodiment, an FGE obtained from Mycobacterium tuberculosis (Mtb)is used in the methods disclosed herein. An exemplary Mtb FGE isdescribed in detail in the Examples below. An exemplary Mtb FGE is onehaving the amino acid sequence provide at GenBank Acc. No. NP—215226(gi:15607852):

mltelvdlpg gsfrmgstrf ypeeapihtv tvrafaverhpvtnaqfaef vsatgyvtva eqpldpglyp gvdaadlcpgamvfcptagp vdlrdwrqww dwvpgacwrh pfgrdsdiadraghpvvqva ypdavayarw agrrlpteae weyaarggttatyawgdqek pggmlmantw qgrfpyrndg algwvgtspvgrfpangfgl ldmignvwew tttefyphhr idppstaccapvklataadp tisqtlkggs hlcapeychr yrpaarspqs qdtatthigf rcvadpvsgThus Mtb FGE, and nucleic acid encoding Mtb FGE, are contemplated of usein the present methods. In addition, the methods used to identify andcharacterize the Mtb FGE are applicable to the identification andcharacterization of other FGEs useful in the methods disclosed herein.

Provided with the extensive amino acid sequence information andcharacterization of FGEs provided herein as well as in the art, it willbe readily apparent to the ordinarily skilled artisan that FGEs includesnaturally-occurring FGEs as well as modified FGEs sharing sequenceidentity with a known FGE (e.g., a naturally-occurring FGE) and whichretain function in specific modification of a serine or cysteine of asulfatase motif.

In general, FGEs of interest include those having at least 60%, usually75%, usually 80%, more usually 90% -95% nucleotide or amino acid residueidentity, when compared and aligned for maximum correspondence with anucleotide sequence or amino acid sequence of a parent FGE, as measuredusing a sequence comparison algorithm available in the art or by visualinspection. Usually a recited sequence identity exists over a region ofthe sequences that is at least about 50 residues in length, more usuallyover a region of at least about 100 residues, and more usually over atleast about 150 residues up to the full-length of the coding region orprotein, with the proviso that the region of comparison includes anactive site of the FGE required for enzymatic activity.

For sequence comparison, typically one sequence acts as a referencesequence, to which test sequences are compared. When using a sequencecomparison algorithm, test and reference sequences are input into acomputer, subsequence coordinates are designated, if necessary, andsequence algorithm program parameters are designated. The sequencecomparison algorithm then calculates the percent sequence identity forthe test sequence(s) relative to the reference sequence, based on thedesignated program parameters.

Optimal alignment of sequences for comparison can be conducted, e.g., bythe local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482(1981), by the homology alignment algorithm of Needleman & Wunsch, J.Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson& Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988), by computerizedimplementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA inthe Wisconsin Genetics Software Package, Genetics Computer Group, 575Science Dr., Madison, Wis.), or by visual inspection (see generally,Current Protocols in Molecular Biology, F.M. Ausubel et al., eds.,Current Protocols, a joint venture between Greene Publishing Associates,Inc. and John Wiley & Sons, Inc., (1995 Supplement) (Ausubel)).

Examples of algorithms that are suitable for determining percentsequence identity and sequence similarity are the BLAST and BLAST 2.0algorithms, which are described in Altschul et al. (1990) J. Mol. Biol.215: 403-410 and Altschuel et al. (1977) Nucleic Acids Res. 25:3389-3402, respectively. Software for performing BLAST analyses ispublicly available through the National Center for BiotechnologyInformation (http://www.ncbi nlm nih.gov/). This algorithm involvesfirst identifying high scoring sequence pairs (HSPs) by identifyingshort words of length W in the query sequence, which either match orsatisfy some positive-valued threshold score T when aligned with a wordof the same length in a database sequence. T is referred to as theneighborhood word score threshold (Altschul et al, supra).

These initial neighborhood word hits act as seeds for initiatingsearches to find longer HSPs containing them. The word hits are thenextended in both directions along each sequence for as far as thecumulative alignment score can be increased. Cumulative scores arecalculated using, for nucleotide sequences, the parameters M (rewardscore for a pair of matching residues; always >0) and N (penalty scorefor mismatching residues; always <0). For amino acid sequences, ascoring matrix is used to calculate the cumulative score. Extension ofthe word hits in each direction are halted when: the cumulativealignment score falls off by the quantity X from its maximum achievedvalue; the cumulative score goes to zero or below, due to theaccumulation of one or more negative-scoring residue alignments; or theend of either sequence is reached. The BLAST algorithm parameters W, T,and X determine the sensitivity and speed of the alignment. The BLASTNprogram (for nucleotide sequences) uses as defaults a wordlength (W) of11, an expectation (E) of 10, M=5, N=-4, and a comparison of bothstrands. For amino acid sequences, the BLASTP program uses as defaults awordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoringmatrix (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915(1989)).

In addition to calculating percent sequence identity, the BLASTalgorithm also performs a statistical analysis of the similarity betweentwo sequences (see, e.g., Karlin & Altschul, Proc. Nat'l. Acad. Sci. USA90:5873-5787 (1993)). One measure of similarity provided by the BLASTalgorithm is the smallest sum probability (P(N)), which provides anindication of the probability by which a match between two nucleotide oramino acid sequences would occur by chance. For example, a nucleic acidis considered similar to a reference sequence if the smallest sumprobability in a comparison of the test nucleic acid to the referencenucleic acid is less than about 0.1, more usually less than about 0.01,and most usually less than about 0.001.

Residue positions that are not identical may differ by conservativeamino acid substitutions, which will be readily apparent from analysisof the alignments as discussed above. Conservative amino acidsubstitutions refer to the interchangeability of residues having similarside chains. For example, amino acid groups defining residues which canbe interchanged for another residue within the group and constitute aconservative amino acid substitution include a group of amino acidshaving aliphatic side chains is glycine, alanine, valine, leucine,proline, and isoleucine (“aliphatic amino acid”); a group of amino acidshaving aliphatic-hydroxyl side chains is serine, and threonine(“aliphatic, hydroxyl amino acid”, which are also encompassed within“polar, uncharged amino acid”); a group of amino acids havingamide-containing side chains is asparagine and glutamine(“amide-containing amino acid”, ”, which are also encompassed within“polar, uncharged amino acid”); a group of amino acids having aromaticside chains is phenylalanine, tyrosine, and tryptophan (“aromatic aminoacid”); a group of amino acids having basic side chains (atphysiological pH) is lysine, arginine, and histidine (“basic aminoacid”); a group of amino acids having sulfur-containing side chains iscysteine and methionine (“sulfur-containing amino acid”); a group ofamino acids that are polar and uncharged (at physiological pH) includeserine, threonine, asparagine, and glutamine (“polar, uncharged aminoacid”); and a group of amino acids have charged side chains (atphysiological pH) is aspartic acid, glutamic acid, histidine, lysine,and arginine (“charged amino acid”). Conservative amino acidssubstitution groups are exemplified by: valine-leucine-isoleucine,phenylalanine-tyrosine, lysine-arginine, alanine-valine, andasparagine-glutamine

Where a cell-free methods is used to convert a sulfatasemotif-containing polypeptide, an isolated FGE can be used. Anyconvenient protein purification procedures may be used to isolate anFGE, see, e.g., Guide to Protein Purification, (Deuthser ed.) (AcademicPress, 1990). For example, a lysate may prepared from a cell theproduces a desired FGE, and purified using HPLC, exclusionchromatography, gel electrophoresis, affinity chromatography, and thelike.

Expression Vectors and Host Cells for Production of AldehydeTagged-Polypeptides

The disclosure provides a nucleic acid encoding aldehyde tags andaldehyde tagged polypeptides, as well as constructs and host cellscontaining nucleic acid. Such nucleic acids comprise a sequence of DNAhaving an open reading frame that encodes an aldehyde tag or aldehydetagged polypeptide and, in most embodiments, is capable, underappropriate conditions, of being expressed. “Nucleic acid” encompassesDNA, cDNA, mRNA, and vectors comprising such nucleic acids.

Nucleic acids encoding aldehyde tags, as well as aldehyde taggedpolypeptides, are provided herein. Such nucleic acids include genomicDNAs modified by insertion of an aldehyde tag-encoding sequence andcDNAs of aldehyde tagged polypeptides. The term “cDNA” as used herein isintended to include all nucleic acids that share the arrangement ofsequence elements found in a native mature mRNA species (includingsplice variants), where sequence elements are exons and 3′ and 5′non-coding regions. Normally mRNA species have contiguous exons, withthe intervening introns, when present, being removed by nuclear RNAsplicing, to create a continuous open reading frame encoding a proteinaccording to the subject invention.

The term “gene” intends a nucleic acid having an open reading frameencoding a polypeptide (e.g., an aldehyde tagged polypeptide), and,optionally, any introns, and can further include adjacent 5′ and 3′non-coding nucleotide sequences involved in the regulation of expression(e.g., regulators of transcription and/or translation, e.g., promoters,enhancers, translational regulatory signals, and the like), up to about20 kb beyond the coding region, but possibly further in eitherdirection, which adjacent 5′ and 3′ non-coding nucleotide sequences maybe endogenous or heterologous to the coding sequence. Transcriptionaland translational regulatory sequences, such as promoters, enhancers,etc., may be included. including about 1 kb, but possibly more, offlanking genomic DNA at either the 5′ or 3′ end of the transcribedregion.

Nucleic acids contemplated herein can be provided as part of a vector(also referred to as a construct), a wide variety of which are known inthe art and need not be elaborated upon herein. Exemplary vectorsinclude, but are not limited to, plasmids; cosmids; viral vectors (e.g.,retroviral vectors); non-viral vectors; artificial chromosomes (YAC's,BAC's, etc.); mini-chromosomes; and the like.

The choice of vector will depend upon a variety of factors such as thetype of cell in which propagation is desired and the purpose ofpropagation. Certain vectors are useful for amplifying and making largeamounts of the desired DNA sequence. Other vectors are suitable forexpression in cells in culture. Still other vectors are suitable fortransfer and expression in cells in a whole animal. The choice ofappropriate vector is well within the skill of the art. Many suchvectors are available commercially.

To prepare the constructs, a polynucleotide is inserted into a vector,typically by means of DNA ligase attachment to a cleaved restrictionenzyme site in the vector. Alternatively, the desired nucleotidesequence can be inserted by homologous recombination or site-specificrecombination. Typically homologous recombination is accomplished byattaching regions of homology to the vector on the flanks of the desirednucleotide sequence, while site-specific recombination can beaccomplished through use of sequences that facilitate site-specificrecombination (e.g., cre-lox, att sites, etc.). Nucleic acid containingsuch sequences can be added by, for example, ligation ofoligonucleotides, or by polymerase chain reaction using primerscomprising both the region of homology and a portion of the desirednucleotide sequence.

Vectors can provide for extrachromosomal maintenance in a host cell orcan provide for integration into the host cell genome. Vectors are amplydescribed in numerous publications well known to those in the art,including, e.g., Short Protocols in Molecular Biology, (1999) F.Ausubel, et al., eds., Wiley & Sons. Vectors may provide for expressionof the nucleic acids encoding a polypeptide of interest (e.g., analdehyde tagged polypeptide, an FGE, etc.), may provide for propagatingthe subject nucleic acids, or both.

Exemplary vectors that may be used include but are not limited to thosederived from recombinant bacteriophage DNA, plasmid DNA or cosmid DNA.For example, plasmid vectors such as pBR322, pUC 19/18, pUC 118, 119 andthe M13 mp series of vectors may be used. Bacteriophage vectors mayinclude λgt10,λgt11, λgt18-23, λZAP/R and the EMBL series ofbacteriophage vectors. Cosmid vectors that may be utilized include, butare not limited to, pJB8, pCV 103, pCV 107, pCV 108, pTM, pMCS, pNNL,pHSG274, COS202, COS203, pWE15, pWE16 and the charomid 9 series ofvectors. Alternatively, recombinant virus vectors may be engineered,including but not limited to those derived from viruses such as herpesvirus, retroviruses, vaccinia virus, poxviruses, adenoviruses,adeno-associated viruses or bovine papilloma virus.

For expression of a polypeptide of interest, an expression cassette maybe employed. Thus, the present invention provides a recombinantexpression vector comprising a subject nucleic acid. The expressionvector provides a transcriptional and translational regulatorysequences, and may provide for inducible or constitutive expression,where the coding region is operably linked under the transcriptionalcontrol of the transcriptional initiation region, and a transcriptionaland translational termination region. These control regions may benative to the gene encoding the polypeptide (e.g., the targetpolypeptide or the FGE), or may be derived from exogenous sources. Ingeneral, the transcriptional and translational regulatory sequences mayinclude, but are not limited to, promoter sequences, ribosomal bindingsites, transcriptional start and stop sequences, translational start andstop sequences, and enhancer or activator sequences. In addition toconstitutive and inducible promoters, strong promoters (e.g., T7, CMV,and the like) find use in the constructs described herein, particularlywhere high expression levels are desired in an in vivo (cell-based) orin an in vitro expression system. Further exemplary promoters includemouse mammary tumor virus (MMTV) promoters, Rous sarcoma virus (RSV)promoters, adenovirus promoters, the promoter from the immediate earlygene of human CMV (Boshart et al., Cell 41:521-530, 1985), and thepromoter from the long terminal repeat (LTR) of RSV (Gorman et al.,Proc. Natl. Acad. Sci. USA 79:6777-6781, 1982). The promoter can also beprovided by, for example, a 5′UTR of a retrovirus.

Expression vectors generally have convenient restriction sites locatednear the promoter sequence to provide for the insertion of nucleic acidsequences encoding proteins of interest. A selectable marker operativein the expression host may be present to facilitate selection of cellscontaining the vector. In addition, the expression construct may includeadditional elements. For example, the expression vector may have one ortwo replication systems, thus allowing it to be maintained in organisms,for example in mammalian or insect cells for expression and in aprokaryotic host for cloning and amplification. In addition theexpression construct may contain a selectable marker gene to allow theselection of transformed host cells. Selection genes are well known inthe art and will vary with the host cell used.

An aldehyde tag cassette is also provided herein, which includes anucleic acid encoding an aldehyde tag, and suitable restriction sitesflanking the tag-encoding sequence for in-frame insertion of a nucleicacid encoding a target polypeptide. Such an expression construct canprovide for addition of an aldehyde tag at the N-terminus or C-terminusof a target polypeptide. The aldehyde tag cassette can be operablylinked to a promoter sequence to provide for expression of the resultingaldehyde tagged polypeptide, and may further include one or moreselectable markers.

The present disclosure also provides expression cassettes for productionof aldehyde tagged-polypeptides (e.g., having an aldehyde tag positionedat a N-terminus, at a C-terminus). Such expression cassettes generallyinclude a first nucleic acid comprising an aldehyde tag-encodingsequence, and at least one restriction site for insertion of a secondnucleic acid encoding a polypeptide of interest. The restriction sitescan be positioned 5′ and/or 3′ of the aldehyde tag-encoding sequence.Insertion of the polypeptide-encoding sequence in-frame with thealdehyde tag-encoding sequence provides for production of a recombinantnucleic acid encoding a fusion protein that is an aldehyde taggedpolypeptide as described herein. Constructs containing such anexpression cassette generally also include a promoter operably linked tothe expression cassette to provide for expression of the aldehydetagged-polypeptide produced. Other components of the expressionconstruction can include selectable markers and other suitable elements.

Expression constructs encoding aldehyde tagged polypeptides can also begenerated using amplification methods (e.g., polymerase chain reaction(PCR)), where at least one amplification primer (i.e., at least one of aforward or reverse primer) includes a nucleic acid sequence encoding analdehyde tag. For example, an amplification primer having an aldehydetag-encoding sequence is designed to provide for amplification of anucleic acid encoding a target polypeptide of interest. The extensionproduct that results from polymerase-mediated synthesis from thealdehyde tag-containing forward primer produces a nucleic acidamplification product encoding a fusion protein composed of an aldehydetagged-target polypeptide. The amplification product is then insertedinto an expression construct of choice to provide an aldehyde taggedpolypeptide expression construct.

Host Cells

Any of a number of suitable host cells can be used in the production ofan aldehyde tagged polypeptide. The host cell used for production of analdehyde tagged-polypeptide can optionally provide for FGE-mediatedconversion, so that the polypeptide produced contains an FGly-containingaldehyde tag following expression and post-translational modification byFGE. Alternatively the host cell can provide for production of anunconverted aldehyde tagged polypeptide (e.g., due to lack of expressionof an FGE that facilitates conversion of the aldehyde tag).

In general, the polypeptides described herein may be expressed inprokaryotes or eukaryotes in accordance with conventional ways,depending upon the purpose for expression. Thus, the present inventionfurther provides a host cell, e.g., a genetically modified host cell,that comprises a nucleic acid encoding an aldehyde tagged polypeptide.The host cell can further optionally comprise a recombinant FGE, whichmay be endogenous or heterologous to the host cell.

Host cells for production (including large scale production) of anunconverted or (where the host cell expresses a suitable FGE) convertedaldehyde tagged polypeptide, or for production of an FGE (e.g., for usein a cell-free method) can be selected from any of a variety ofavailable host cells. Exemplary host cells include those of aprokaryotic or eukaryotic unicellular organism, such as bacteria (e.g.,Escherichia coli strains, Bacillus spp. (e.g., B. subtilis), and thelike) yeast or fungi (e.g., S. cerevisiae, Pichia spp., and the like),and other such host cells can be used. Exemplary host cells originallyderived from a higher organism such as insects, vertebrates,particularly mammals, (e.g. CHO, HEK, and the like), may be used as theexpression host cells.

Specific expression systems of interest include bacterial, yeast, insectcell and mammalian cell derived expression systems. Representativesystems from each of these categories are provided below.

Bacteria. Expression systems in bacteria include those described inChang et al., Nature (1978) 275:615; Goeddel et al., Nature (1979)281:544; Goeddel et al., Nucleic Acids Res. (1980) 8:4057; EP 0 036,776;U.S. Pat. No. 4,551,433; DeBoer et al., Proc. Natl. Acad. Sci. (USA)(1983) 80:21-25; and Siebenlist et al., Cell (1980) 20:269.

Yeast. Expression systems in yeast include those described in Hinnen etal., Proc. Natl. Acad. Sci. (USA) (1978) 75:1929; Ito et al., J.Bacteriol. (1983) 153:163; Kurtz et al., Mol. Cell. Biol. (1986) 6:142;Kunze et al., J. Basic Microbiol. (1985) 25:141; Gleeson et al., J. Gen.Microbiol. (1986) 132:3459; Roggenkamp et al., Mol. Gen. Genet. (1986)202:302; Das et al., J. Bacteriol. (1984) 158:1165; De Louvencourt etal., J. Bacteriol. (1983) 154:737; Van den Berg et al., Bio/Technology(1990) 8:135; Kunze et al., J. Basic Microbiol. (1985) 25:141; Cregg etal., Mol. Cell. Biol. (1985) 5:3376; U.S. Pat. Nos. 4,837,148 and4,929,555; Beach and Nurse, Nature (1981) 300:706; Davidow et al., Curr.Genet. (1985)10:380; Gaillardin et al., Curr. Genet. (1985) 10:49;Ballance et al., Biochem. Biophys. Res. Commun. (1983) 112:284-289;Tilburn et al., Gene (1983) 26:205-221; Yelton et al., Proc. Natl. Acad.Sci. (USA) (1984) 81:1470-1474; Kelly and Hynes, EMBO J. (1985)4:475479; EP 0 244,234; and WO 91/00357.

Insect Cells. Expression of heterologous genes in insects isaccomplished as described in U.S. Pat. No. 4,745,051; Friesen et al.,“The Regulation of Baculovirus Gene Expression”, in: The MolecularBiology Of Baculoviruses (1986) (W. Doerfler, ed.); EP 0 127,839; EP 0155,476; and Vlak et al., J. Gen. Virol. (1988) 69:765-776; Miller etal., Ann. Rev. Microbiol. (1988) 42:177; Carbonell et al., Gene (1988)73:409; Maeda et al., Nature (1985) 315:592-594; Lebacq-Verheyden etal., Mol. Cell. Biol. (1988) 8:3129; Smith et al., Proc. Natl. Acad.Sci. (USA) (1985) 82:8844; Miyajima et al., Gene (1987) 58:273; andMartin et al., DNA (1988) 7:99. Numerous baculoviral strains andvariants and corresponding permissive insect host cells from hosts aredescribed in Luckow et al., Bio/Technology (1988) 6:47-55, Miller etal., Generic Engineering (1986) 8:277-279, and Maeda et al., Nature(1985) 315:592-594.

Mammalian Cells. Mammalian expression is accomplished as described inDijkema et al., EMBO J. (1985) 4:761, Gorman et al., Proc. Natl. Acad.Sci. (USA) (1982) 79:6777, Boshart et al., Cell (1985) 41:521 and U.S.Pat. No. 4,399,216. Other features of mammalian expression arefacilitated as described in Ham and Wallace, Meth. Enz. (1979) 58:44,Barnes and Sato, Anal. Biochem. (1980) 102:255, U.S. Pat. Nos.4,767,704, 4,657,866, 4,927,762, 4,560,655, WO 90/103430, WO 87/00195,and U.S. RE 30,985.

When any of the above host cells, or other appropriate host cells ororganisms, are used to replicate and/or express the polynucleotides ornucleic acids of the invention, the resulting replicated nucleic acid,RNA, expressed protein or polypeptide, is within the scope of theinvention as a product of the host cell or organism.

The product can be recovered by any appropriate means known in the art.

Further, any convenient protein purification procedures may be employed,where suitable protein purification methodologies are described in Guideto Protein Purification, (Deuthser ed.) (Academic Press, 1990). Forexample, a lysate may prepared from a cell comprising the expressionvector expressing a polypeptide of interest, and purified using HPLC,exclusion chromatography, gel electrophoresis, affinity chromatography,and the like.

Moieties for Modification of Polypeptides

The aldehyde tagged, FGly-containing polypeptides can be subjected tomodification to provide for attachment of a wide variety of moieties.Exemplary molecules of interest include, but are not necessarily limitedto, a detectable label, a small molecule, a peptide, and the like.

The moiety of interest is provided as component of a reactive partnerfor reaction with an aldehyde of the FGly residue of a convertedaldehyde tag of the tagged polypeptide. Since the methods of taggedpolypeptide modification are compatible with conventional chemicalprocesses, the methods of the invention can exploit a wide range ofcommercially available reagents to accomplish attachment of a moiety ofinterest to a FGly residue of an aldehyde tagged polypeptide. Forexample, aminooxy, hydrazide, hydrazine, or thiosemicarbazidederivatives of a number of moieties of interest are suitable reactivepartners, and are readily available or can be generated using standardchemical methods.

For example, an aminooxy-PEG can be generated from monoamino-PEGs andaminooxyglycine using standard protocols. The aminooxy-PEG can then bereacted with a converted aldehyde tagged polypeptide to provide forattachment of the PEG moiety. Delivery of a biotin moiety to a convertedaldehyde tagged polypeptide can be accomplished using aminooxy biotin,biotin hydrazide or 2,4 dinitrophenylhydrazine.

Provided the present disclosure, the ordinarily skilled artisan canreadily adapt any of a variety of moieties to provide a reactive partnerfor conjugation to an aldehyde tagged polypeptide as contemplatedherein. The ordinarily skilled artisan will appreciate that factors suchas pH and steric hindrance (i.e., the accessibility of the aldehyde tagto reaction with a reactive partner of interest) are of importance,Modifying reaction conditions to provide for optimal conjugationconditions is well within the skill of the ordinary artisan, and isroutine in the art. In general, it is normally desirable to conductionconjugation reactions at a pH below 7, with a pH of about 5.5, about 6,about 6.5, usually about 5.5 being optimal. Where conjugation isconducted with an aldehyde tagged polypeptide present in or on a livingcell, the conditions are selected so as to be physiologicallycompatible. For example, the pH can be dropped temporarily for a timesufficient to allow for the reaction to occur but within a periodtolerated by the cell having an aldehyde tag (e.g., from about 30 min to1 hour). Physiological conditions for conducting modification ofaldehyde tagged polypeptides on a cell surface can be similar to thoseused in a ketone-azide reaction in modification of cells bearingcell-surface azides (see, e.g., U.S. Pat. No. 6,570,040).

In general, the moiety or moieties can provide for one or more of a widevariety of functions or features. Exemplary moieties include detectablelabels (e.g., dye labels (e.g., chromophores, fluorophores), biophysicalprobes (spin labels, NMR probes), FRET-type labels (e.g., at least onemember of a FRET pair, including at least one member of afluorophore/quencher pair), BRET-type labels (e.g., at least one memberof a BRET pair), immunodetectable tags (e.g., FLAG, His(6), and thelike), localization tags (e.g., to identify association of a taggedpolypeptide at the tissue or molecular cell level (e.g., associationwith a tissue type, or particular cell membrane)), and the like);light-activated dynamic moieties (e.g., azobenzene mediated poreclosing, azobenzene mediated structural changes, photodecagingrecognition motifs); water soluble polymers (e.g., PEGylation);purification tags (e.g., to facilitate isolation by affinitychromatography (e.g., attachment of a FLAG epitope)); membranelocalization domains (e.g., lipids or GPI-type anchors); immobilizationtags (e.g., to facilitate attachment of the polypeptide to a surface,including selective attachment); drugs (e.g., to facilitate drugtargeting, e.g., through attachment of the drug to an antibody);targeted delivery moieties, (e.g., ligands for binding to a targetreceptor (e.g., to facilitate viral attachment, attachment of atargeting protein present on a liposome, etc.)), and the like.

Specific, non-limiting examples are provided below.

Detectable Labels.

The compositions and methods of the invention can be used to deliver adetectable label to an aldehyde tagged polypeptide. Exemplary detectablelabels include, but are not necessarily limited to, fluorescentmolecules (e.g., autofluorescent molecules, molecules that fluoresceupon contact with a reagent, etc.), radioactive labels (e.g., ¹¹¹In,¹²⁵I, ¹³¹I, ²¹²B, ⁹⁰Y, ¹⁸⁶Rh, and the like); biotin (e.g., to bedetected through reaction of biotin and avidin); fluorescent tags;imaging reagents, and the like. Detectable labels also include peptidesor polypeptides that can be detected by antibody binding, e.g., bybinding of a detectably labeled antibody or by detection of boundantibody through a sandwich-type assay.

Attachment of Target Molecules to a Support.

The methods can provide for conjugation of an aldehyde taggedpolypeptide to a moiety to facilitate attachment of the polypeptide to asolid substratum (e.g., to facilitate assays), or to a moiety tofacilitate easy separation (e.g., a hapten recognized by an antibodybound to a magnetic bead). In one embodiment, the methods of theinvention are used to provide for attachment of a protein to an array(e.g., chip) in a defined orientation. For example, a polypeptide havingan aldehyde tag at a selected site (e.g., at or near the N-terminus) canbe generated, and the methods and compositions of the invention used todeliver a moiety to the converted aldehyde tag. The moiety can then beused as the attachment site for affixing the polypeptide to a support(e.g., solid or semi-solid support, particularly a support suitable foruse as a microchip in high-throughput assays).

Attachment of Molecules for Delivery to a Target Site.

The reactive partner for the aldehyde tagged polypeptide can comprise asmall molecule drug, toxin, or other molecule for delivery to the celland which can provide for a pharmacological activity or can serve as atarget for delivery of other molecules.

Also contemplated is use of a reactive partner that comprises one of apair of binding partners (e.g., a ligand, a ligand-binding portion of areceptor, a receptor-binding portion of a ligand, etc.). For example,the reactive partner can comprise a polypeptide that serves as a viralreceptor and, upon binding with a viral envelope protein or viral capsidprotein, facilitates attachment of virus to the cell surface on whichthe modified aldehyde tagged protein is expressed. Alternatively, thereactive partner comprises an antigen that is specifically bound by anantibody (e.g., monoclonal antibody), to facilitate detection and/orseparation of host cells expressing the modified aldehyde taggedpolypeptide.

Water-Soluble Polymers

A moiety of particular interest is a water-soluble polymer. A“water-soluble polymer” refers to a polymer that is soluble in water andis usually substantially non-immunogenic, and usually has an atomicmolecular weight greater than about 1,000 Daltons. The methods andcompositions described herein can be used to attach one or morewater-soluble polymers to an aldehyde tagged polypeptide. Attachment ofa water-soluble polymer (e.g., PEG) of a polypeptide, particularly apharmaceutically active (therapeutic) polypeptide can be desirable assuch modification can increase therapeutic index by increasing serumhalf-life as a result of increased proteolytic stability and/ordecreased renal clearance.

Additionally, attachment of one or more polymers (e.g., PEGylation) canreduce immunogenicity of protein pharmaceuticals.

In some embodiments, the water-soluble polymer has an effectivehydrodynamic molecular weight of greater than about 10,000 Da, greaterthan about 20,000 to 500,000 Da, greater than about 40,000 Da to 300,000Da, greater than about 50,000 Da to 70,000 Da, usually greater thanabout 60,000 Da. By “effective hydrodynamic molecular weight” isintended the effective water-solvated size of a polymer chain asdetermined by aqueous-based size exclusion chromatography (SEC). Whenthe water-soluble polymer contains polymer chains having polyalkyleneoxide repeat units, such as ethylene oxide repeat units, each chain canhave an atomic molecular weight of between about 200 Da and about 80,000Da, or between about 1,500 Da and about 42,000 Da, with 2,000 to about20,000 Da being of particular interest. Unless referred to specifically,molecular weight is intended to refer to atomic molecular weight.Linear, branched, and terminally charged water soluble polymers (e.g.,PEG) are of particular interest.

Polymers useful as moieties to be attached to an aldehyde taggedpolypeptide can have a wide range of molecular weights, and polymersubunits. These subunits may include a biological polymer, a syntheticpolymer, or a combination thereof. Examples of such water-solublepolymers include: dextran and dextran derivatives, including dextransulfate, P-amino cross linked dextrin, and carboxymethyl dextrin,cellulose and cellulose derivatives, including methylcellulose andcarboxymethyl cellulose, starch and dextrines, and derivatives andhydroylactes of starch, polyalklyene glycol and derivatives thereof,including polyethylene glycol, methoxypolyethylene glycol, polyethyleneglycol homopolymers, polypropylene glycol homopolymers, copolymers ofethylene glycol with propylene glycol, wherein said homopolymers andcopolymers are unsubstituted or substituted at one end with an alkylgroup, heparin and fragments of heparin, polyvinyl alcohol and polyvinylethyl ethers, polyvinylpyrrolidone, aspartamide, and polyoxyethylatedpolyols, with the dextran and dextran derivatives, dextrine and dextrinederivatives. It will be appreciated that various derivatives of thespecifically recited water-soluble polymers are also contemplated.

Water-soluble polymers such as those described above are well known,particularly the polyalkylene oxide based polymers such as polyethyleneglycol “PEG” (See. e.g., “Poly(ethylene glycol) Chemistry: Biotechnicaland Biomedical Applications”, J. M. Harris, Ed., Plenum Press, New York,N.Y. (1992); and “Poly(ethylene glycol) Chemistry and BiologicalApplications”, J. M. Harris and S. Zalipsky, Eds., ACS (1997); andInternational Patent Applications: WO 90/13540, WO 92/00748, WO92/16555, WO 94/04193,WO 94/14758, WO 94/17039, WO 94/18247, WO94/28937, WO 95/11924, WO 96/00080, WO 96/23794, WO 98/07713, WO98/41562, WO 98/48837, WO 99/30727, WO 99/32134, WO 99/33483, WO99/53951, WO 01/26692, WO 95/13312, WO 96/21469, WO 97/03106, WO99/45964, and U.S. Pat. Nos. 4,179,337; 5,075,046; 5,089,261; 5,100,992;5,134,192; 5,166,309; 5,171,264; 5,213,891; 5,219,564; 5,275,838;5,281,698; 5,298,643; 5,312,808; 5,321,095; 5,324,844; 5,349,001;5,352,756; 5,405,877; 5,455,027; 5,446,090; 5,470,829; 5,478,805;5,567,422; 5,605,976; 5,612,460; 5,614,549; 5,618,528; 5,672,662;5,637,749; 5,643,575; 5,650,388; 5,681,567; 5,686,110; 5,730,990;5,739,208; 5,756,593; 5,808,096; 5,824,778; 5,824,784; 5,840,900;5,874,500; 5,880,131; 5,900,461; 5,902,588; 5,919,442; 5,919,455;5,932,462; 5,965,119; 5,965,566; 5,985,263; 5,990,237; 6,011,042;6,013,283; 6,077,939; 6,113,906; 6,127,355; 6,177,087; 6,180,095;6,194,580; 6,214,966).

Exemplary polymers of interest include those containing a polyalkyleneoxide, polyamide alkylene oxide, or derivatives thereof, includingpolyalkylene oxide and polyamide alkylene oxide comprising an ethyleneoxide repeat unit of the formula —(CH₂—CH₂—O)—. Further exemplarypolymers of interest include a polyamide having a molecular weightgreater than about 1,000 Daltons of the formula —[C(O)—X—C(O)—NH—Y—NH]n-or —[NH—Y—NH—C(O)—X—C(O)]_(n)—, where X and Y are divalent radicals thatmay be the same or different and may be branched or linear, and n is adiscrete integer from 2-100, usually from 2 to 50, and where either orboth of X and Y comprises a biocompatible, substantially non-antigenicwater-soluble repeat unit that may be linear or branched. Furtherexemplary water-soluble repeat units comprise an ethylene oxide of theformula —(CH₂—CH₂—O)— or —(CH₂—CH₂—O)—. The number of such water-solublerepeat units can vary significantly, with the usual number of such unitsbeing from 2 to 500, 2 to 400, 2 to 300, 2 to 200, 2 to 100, and mostusually 2 to 50. An exemplary embodiment is one in which one or both ofX and Y is selected from: —((CH₂)_(n1)—(CH₂—CH₂—O)_(n2)—(CH₂)—or—((CH₂)_(n1)—(O—CH₂—CH₂)_(n2)—(CH₂)_(n-1)—), where n1 is 1 to 6, 1 to 5,1 to 4 and most usually 1 to 3, and where n2 is 2 to 50, 2 to 25, 2 to15, 2 to 10, 2 to 8, and most usually 2 to 5. A further exemplaryembodiment is one in which X is —(CH₂—CH₂)—, and where Y is—(CH₂—(CH₂—CH₂—O)₃—CH₂—CH₂—CH₂)— or —(CH₂—CH₂—CH₂—(O—CH₂—CH₂)₃—CH₂)—.

The polymer can include one or more spacers or linkers. Exemplaryspacers or linkers include linear or branched moieties comprising one ormore repeat units employed in a water-soluble polymer, diamino and ordiacid units, natural or unnatural amino acids or derivatives thereof,as well as aliphatic moieties, including alkyl, aryl, heteroalkyl,heteroaryl, alkoxy, and the like, which can contain, for example, up to18 carbon atoms or even an additional polymer chain.

The polymer moiety, or one or more of the spacers or linkers of thepolymer moiety when present, may include polymer chains or units thatare biostable or biodegradable. For example, Polymers with repeatlinkages have varying degrees of stability under physiologicalconditions depending on bond lability. Polymers with such bonds can becategorized by their relative rates of hydrolysis under physiologicalconditions based on known hydrolysis rates of low molecular weightanalogs, e.g., from less stable to more stable, e.g., polyurethanes(—NH—C(O)—O—)>polyorthoesters (—O—C((OR)(R′))—O—)>polyamides(—C(O)—NH-). Similarly, the linkage systems attaching a water-solublepolymer to a target molecule may be biostable or biodegradable, e.g.,from less stable to more stable: carbonate (—O—C(O)—O—)>ester(—C(O)—O—)>urethane (—NH—C(O)—O—)>orthoester (—O—C((OR)(R′))—O—)>amide(—C(O)—NH—). In general, it may be desirable to avoid use of sulfatedpolysaccharide, depending on the lability of the sulfate group. Inaddition, it may be less desirable to use polycarbonates and polyesters.These bonds are provided by way of example, and are not intended tolimit the types of bonds employable in the polymer chains or linkagesystems of the water-soluble polymers useful in the modified aldehydetagged polypeptides disclosed herein.

Methods for Conversion and Modification of an Aldehyde Tag

Conversion of an aldehyde tag present in an aldehyde tagged polypeptideaccomplished by cell-based (in vivo) or cell-free methods (in vitro).Similarly, modification of a converted aldehyde tag of an aldehydetagged polypeptide can be accomplished by cell-based (in vivo) orcell-free methods (in vitro). These are described in more detail below.

“In Vivo” Host Cells Conversion and Modification

Conversion of an aldehyde tag of an aldehyde tagged polypeptide can beaccomplished by expression of the aldehyde tagged polypeptide in a cellthat contains a suitable FGE. In this embodiment, conversion of thecysteine or serine of the aldehyde tag is occurs during or followingtranslation in the host cell. In this embodiment, the FGE of the hostcell can be endogenous to the host cell, or the host cell can berecombinant for a suitable FGE that is heterologous to the host cell.FGE expression can be provided by an expression system endogenous to theFGE gene (e.g., expression is provided by a promoter and other controlelements present in the native FGE gene of the host cell), or can beprovided by from a recombinant expression system in which the FGE codingsequence is operably linked to a heterologous promoter to provide forconstitutive or inducible expression. Use of a strong promoter toprovide high levels of FGE expression may be of particular interest insome embodiment.

Depending on the nature of the target polypeptide containing thealdehyde tag, following conversion the converted aldehyde taggedpolypeptide is either retained in the host cell intracellularly, issecreted, or is associated with the host cell extracellular membrane.Where the aldehyde tag of the aldehyde tagged polypeptide is present atthe cell surface, modification of the converted aldehyde tag can beaccomplished by use of a reactive partner to attach a moiety of thereactive partner to a FGly residue of a surface accessible aldehyde tagunder physiological conditions. Conditions suitable for use toaccomplish conjugation of a reactive partner moiety to an aldehydetagged polypeptide are similar to those described in Mahal et al. (1997May 16) Science 276(5315):1125-8.

“In Vitro” (Cell-Free) Conversion and Modification

In vitro (cell-free) conversion of an aldehyde tag of an aldehyde taggedpolypeptide can be accomplished by contacting an aldehyde taggedpolypeptide with an FGE under conditions suitable for conversion of acysteine or serine of a sulfatase motif of the aldehyde tag to a FGly.For example, nucleic acid encoding an aldehyde tagged polypeptide can beexpression in an in vitro transcription/translation system in thepresence of a suitable FGE to provide for production of convertedaldehyde tagged polypeptides.

Alternatively, isolated, unconverted aldehyde tagged polypeptide can beisolated following recombinant production in a host cell lacking asuitable FGE or by synthetic production. The isolated aldehyde taggedpolypeptide is then contacted with a suitable FGE under conditions toprovide for aldehyde tag conversion. In this embodiment, if the aldehydetag may not be readily solvent accessible in the isolated polypeptide,the aldehyde tagged polypeptide can be unfolded by methods known in theart (e.g., using heat, adjustment of pH, chaotropic agents, (e.g., urea,and the like), organic solvents (e.g., hydrocarbons: octane, benzene,chloroform), etc.) and the denatured protein contacted with a suitableFGE. The converted aldehyde tagged polypeptide can then be refoldedunder suitable conditions.

With respect to modification of converted aldehyde tagged, modificationis normally carried out in vitro. Converted aldehyde tagged polypeptideis isolated from a production source (e.g., recombinant host cellproduction, synthetic production), and contacted with a reactive partnerunder conditions suitable to provide for conjugation of a moiety of thereactive partner to the FGly of the aldehyde tag. If the aldehyde tag isnot solvent accessible, the aldehyde tagged polypeptide can be unfoldedby methods known in the art prior to reaction with a reactive partner.

Switchable Moieties Attached to Aldehyde Tag

In some embodiments, aldehyde tagged polypeptides can be modified in amanner so as to facilitate removal of a conjugated moiety at the FGlyresidue of the aldehyde tag and replacement with a different moiety.This aspect of the invention exploits the relative thermodynamicstability of conjugates formed with different reactive partners.

For example, as illustrated in FIG. 6, aldehydes readily react withhydrazide and aminooxy moieties to yield hydrazones and oximes,respectively. Although both of these conjugates are robust underphysiological conditions, oximes are more thermodynamically stable.Moreover, thiosemicarbazides also readily react with aldehydes to formthiosemicarbazone conjugates, which are less thermodynamically stablethan oximes. These differences in thermodynamic stability can beexploited for switching the lower stability hydrazone conjugate to amore stable oxime conjugate, and for switching the lower stability oximeconjugate to a more stable semicarbazone conjugate. This feature of thealdehyde tag allows the modification of the target protein with tworeagents in sequence (i.e., sequentially), as illustrated in the Examplebelow.

Modified Aldehyde Tagged Polypeptides

The reaction products produced by reaction of an aldehyde taggedpolypeptide with a reactive partner comprising a moiety of interest aregenerally modified in a site-specific manner (i.e., at the FGlyresidue), providing for a substantially homogenous population modifiedaldehyde tagged polypeptides. Heterogenous populations of such reactionproducts can be generated by use of two or more reactive partnerscomprising different moieties, where desired.

For example, where the target polypeptide is modified by PEGylation, themethods can be adapted to provide for production of a homogenouspopulation of PEGylated polypeptides (in which the polypeptides aremodified with the same PEG moieties) or a heterogenous population ofPEGylated polypeptides (in which the polypeptides in the composition aremodified with different types of PEG molecules).

Kits and Systems

Kits and systems are provided to facilitate and, where desired,standardize the compositions of the invention and the uses thereof. Kitscontemplated herein can include one or more of a construct encoding analdehyde tag for insertion into a target polypeptide; a constructencoding an aldehyde tagged polypeptide for expression in a host cell(e.g., as an expression cassette to provide for insertion of a codingsequence of a target polypeptide as a N-terminal or C-terminal fusionwith the aldehyde tag); a host cell that produces an FGE compatible withan aldehyde tag of the kit, where the FGE may be endogenous,recombinant, or heterologous; a host cell genetically modified toexpress an aldehyde tagged polypeptide of interest, which host cell canfurther express an endogenous, recombinant, or heterologous FGEcompatible for conversion of the aldehyde tag of the tagged polypeptide;and a reactive partner for chemical modification of the convertedaldehyde tag of the tagged polypeptide.

In addition, the kit can contain instructions for using the componentsof the kit, particularly the compositions of the invention that arecontained in the kit.

EXAMPLES

The following examples are put forth so as to provide those of ordinaryskill in the art with a complete disclosure and description of how tomake and use the present invention, and are not intended to limit thescope of what the inventors regard as their invention nor are theyintended to represent that the experiments below are all or the onlyexperiments performed. Efforts have been made to ensure accuracy withrespect to numbers used (e.g. amounts, temperature, etc.) but someexperimental errors and deviations should be accounted for. Unlessindicated otherwise, parts are parts by weight, molecular weight isweight average molecular weight, temperature is in degrees Centigrade,and pressure is at or near atmospheric.

Methods and Materials:

The following materials and methods were used in Examples 1-6 set outbelow.

Plasmid construction. The following oligonucleotides were used in theExamples below:

Primer Sequence (5′ → 3′)^(a) ald₆-stf0 start^(b)CCACTGTGCACACCATCGCGGATG TCCGACCACCCCACCGCC ald₁₃-stf0 senseCATGGCACCACTGTGCACACCATC GCGGGGCTCGCTGTTCACCGGCCG CGACGTCCA ald₁₃-stf0TATGGACGTCGCGGCCGGTGAACA antisense GCGAGCCCCGCGATGGTGTGCACA GTGGTGCald-stf0 end GCACCACCACCACCACCACTGAGA TCCGGCTGC ald-stf0CCATGGCACCACTGGCCACACCAT Cys5Ala^(b) CGCGG stf0 end^(b)CGGCCGCGATGTGCGCCTTGAAGA TCTGC mbp sense GATCCCTGTGCACACCATCGCGGT GAGCmbp nonsense GGCCGCTCACCGCGATGGTGTGCA CAGG ald-mbpCCGCGTGGATCCCTGGCCACACCA Cys5Ala^(b) TCGCGG ald₆-hGH startCTATGCTACCATGGCGCTGTGCAC ACCATCGCGGACCATTCCCTTATC CAGGC hGH endCTATGCTAGCGGCCGCGAAGCCAC AGCTGCCCTCCAC ald₆-hGH TATACCATGGCGCTGGCCACACCACys5Ala^(b) TCGCGGACC fge start CTATGCTACCATGGCTGACCGAGT TGGTTGACCTGCfge end TAGCATAGCTCGAGCTACCCGGAC ACCGGGTCG fge in-frame^(b)GAGGAATTAACCATGCTGACCGAG TTGGTTG ^(a)Aldehyde-encoding bases areunderlined. ^(b)Site-directed mutagenesis primer. Where appropriate,numbered from the beginning of the respective protein start codon. Apair of complementary primers was used for each mutant. The reversecomplements are not shown.

The sulfatase motifs of the constructs are provided below:

ald₁₃-Stf0: LCTPSRGSLFTGR-(mycobacterial sulfotransferase)

ald₁₃-Stf0 (C5A): LATPSRGSLFTGR-(mycobacterial sulfotransferase)

ald₆-Stf0: LCTPSR-(mycobacterial sulfotransferase)

ald₆-MBP: LCTPSR-(maltose binding protein)

ald₆-hGH: LCTPSR-(human growth hormone)

The nucleic acid encoding ald₁₃-Stf0 was constructed by ligatingannealed oligonucleotides into a previously constructed pET28-Stf0vector¹⁹ between NcoI and NdeI restriction sites. The stf0 stop codonwas removed by Quikchange™ (Stratagene) mutagenesis to allow for aC-terminal His₆ tag. ald₆Stf0 was constructed using QUICKCHANGE™(Stratagene) to eliminate the nucleotides that encode the last 7residues of the 13 amino acid aldehyde tag. The gene encoding ald₆-MBPwas constructed by ligating annealed oligonucleotides into the pMALc-Hvector¹⁹ between XhoI and HindIII restriction sites. The gene encodinghGH (human growth hormone 1 transcript variant 1, encoding residues29-217) was amplified from pCMV-SPORT6.1.ccdb (Open Biosystems) using a5′ primer that encoded the 6 amino acid aldehyde tag and ligated intopET28b between NcoI and NotI restriction sites. The gene encodingMycobacterium tuberculosis FGE (Rv0712, encoding residues 2-299) wasamplified from a previously prepared pET14b plasmid containing FGE¹⁴ andligated into pBAD/myc-his A (Invitrogen) between NcoI and XhoIrestriction sites. The FGE gene was placed in frame with the start codonusing QuikChange PCR mutagenesis kit (Stratagene). Cys→Ala mutants ofald-Sft0, ald-MBP, and ald-hGH were produced using QuikChange™mutagenesis. DNA sequencing was performed to confirm the fidelity ofeach gene product. Protein-encoding plasmids were transformed intoBL21(DE3) E. coli cells (Invitrogen).

Protein expression and purification. Clonal populations of BL21(DE3) E.coli cells harboring only an aldehyde-tagged protein-encoding plasmidwere incubated in LB media with kanamycin with shaking at 37° C. untilOD₆₀₀=0.5, at which time the temperature was lowered to 18° C. and 100μM IPTG was added. BL21(DE3) E. coli cells harboring an aldehyde-taggedprotein-encoding plasmid and an FGE-encoding plasmid were incubated inLB media with kanamycin and ampicillin with shaking at 37° C. untilOD₆₀₀=0.5, at which time the FGE expression was induced with 0.02%arabinose. After 30 min, the temperature was lowered to 18° C. and 100μM IPTG was added to induce expression of the aldehyde-tagged protein.After 12-16 h, cells were harvested and resuspended in 20 ml of lysisbuffer (50 mM Tris, 500 mM NaCl, 10% glycerol, 20 mM imidazole, 1 mMDTT, 1 mM TCEP, 1 mM methionine, pH 7.5, for ald₆-hGH or 50 mM NaH₂PO₄,300 mM NaCl, 10 mM imidazole, pH 7.4, for ald-Stf0 and ald₆-MBP) perliter of culture and lysed by sonication.

Cell lysates were treated with DNase (10 μg/ml), cleared bycentrifugation and applied to a 1 ml HisTrap column (GE Healthcare). Thecolumn was washed with lysis buffer with 35 mM imidazole and His₆-taggedprotein was eluted using lysis buffer with 250 mM imidazole. ald₆-hGHwas further purified on a Sephadex 16/60 5300 column (GE Healthcare).

Tryptic Digestion and Standard Addition assay. 10 μg of protein wasdigested with 0.4 μg trypsin (Promega) at 37° C. for 16 hours in 50 mMNH₄HCO₃ pH 8. This protocol was deemed sufficient for complete digestionas no peptides containing missed cleavage sites were detected byMALDI-TOF mass spectrometry after 3 hours of digestion under identicalconditions. Standard addition assays were run in water with about 0.6 μgprotein digest per run. Synthetic peptides containing either thecysteine or aldehyde (FGly) were added in equimolar amounts followed byaddition of 100 mM DTT. This solution was allowed to incubate at RT for1 hour prior to mass spectrometry analysis (Agilent MSD). Blank runswere added between randomly selected runs and no residual signal wasdetected. Cysteine oxidation was not observed.

Small molecule labeling. Fluorescent labeling reactions were run with 10μg target protein with 300 μM aminooxy dye (Alexa Fluor 647C5-aminooxyacetamide, Invitrogen) in labeling buffer (100 mM MES pH 5.5,1% SDS) at 37° C. for 2 hours. 166 mM methoxylamine was added to controlreactions. Reaction mixtures were separated by SDS-PAGE and fluorescencewas detected using a Typhoon 9410 scanner (GE Healthcare). Proteinloading was determined by Sypro Ruby (Sigma) staining. Biotinylation wasafforded by incubating 10 μg of target protein with 30 μM biotinhydrazide (Sigma) in labeling buffer for 2 hours at 37° C. Subsequentdisplacement of biotin hydrazide was afforded by addition of either 166mM methoxlamine or 1 mM aminooxyFLAG at 37° C. for 2 hours. The α-biotinwestern blot was performed using standard protocol. The α-FLAG blot wasobtained by stripping the membrane and reprobing with αFLAG M2 (Sigma)Aminooxy-FLAG was synthesized using standard FMOC-based solid phasepeptide synthesis protocols. The final residue added, C-terminal, was(t-Boc-aminooxy)acetic acid followed by cleavage under standardconditions.

PEGylation Aminooxy-PEGs were created from monoamino-PEGs andaminooxyglycine using standard protocols. More specifically,aminooxyPEGs were produced by adding aminoPEGs (Shearwater Polymers) toactivated (t-Boc-aminooxy)acetic acid using standard peptide couplingconditions. Briefly, amide bond formation was accomplished by addingaminoPEG to the preformed 8-hydroxybenzotriazole ester of(t-Boc-aminooxy)acetic acid (5 equivalents) in acetonitrile.Purification of the product was afforded by precipitation from ether,followed by trituration. Deprotection was accomplished by treatment withan aqueous triflouroacetic acid solution (95% TFA, 5% H₂O) for 3 hoursat RT. Precipitation into ether and trituration afforded a pure productas judged by ¹H NMR. Conjugation to aldehyde-tagged proteins wasafforded by incubation of 10 μg target protein and 10 mM aminooxyPEG incoupling solution (49.95% CH₃CN, 49.95% H₂O, 0.1% TFA) for 1 hourfollowed by lyophilization.

Conjugation to aldehyde-tagged proteins was afforded by incubation of 10μg target protein and 10 mM aminooxyPEG in coupling solution (49.95%CH₃CN, 49.95% H₂O, 0.1% TFA) for 1 hour followed by lyophilization.Reaction mixtures were resuspended in water, separated by SDS-PAGE,stained with Sypro Orange (Invitrogen) and detected using a Typhoon 9410scanner (GE Healthcare).

Example 1 Site-Specific Modification of a Sulfatase Motif in ProteinExpressed on E. coli

Protein constructs with either N- or C-terminal aldehyde tags wereexpressed in E. coli. Three protein targets were explored, aC-terminally tagged maltose binding protein (MBP), an N-terminallytagged human growth hormone (hGH), and an N-terminally taggedmycobacterial sulfotransferase (Stf0). Additionally, two variants of thealdehyde tag were tested—a 13 residue tag (ald₁₃-Stf0) that included theentire sulfatase consensus motif, and a 6 residue tag (ald₆-Stf0) thatincluded a shorter sequence containing a sulfatase consensus motif:

LCTPSRGSLFTGR-Stf0 (ald₁₃-Stf0)

LCTPSR-Stf0 (ald₆-Stf0).

In order to ensure efficient formation of FGly, tagged proteins wereco-expressed with a prokaryotic FGE from Mycobacterium tuberculosis(Mtb) (described in the Examples below).

Tryptic digestion of the peptide containing the 13 residue sulfataseconsensus motif (ald₁₃-Stf0) allowed direct mass spectral identificationof FGly (FIG. 2). While the FGly-containing peptide could be easilyidentified, the cysteine containing peptide was not observed—indicatingefficient oxidation of the aldehyde tag.

Example 2 Peptides Containing the 6 Residue Sulfatase Consensus MotifDemonstrate High Rates of Conversion

To quantify the extent of conversion from Cys to FGly, a standardaddition assay was performed. The relative levels of cysteine and FGlywithin tryptically derived peptides from target proteins were comparedto a standard addition curve, which was produced by doping syntheticpeptides into tryptic digests at various concentrations (FIG. 3, panela).

Unexpectedly, the Stf0 peptide containing the 6 residue sulfataseconsensus motif (ald₆-Stf0), demonstrated slightly higher conversionthan that of the Stf0 peptide containing the conserved 13 amino acidsequence (ald₁₃-Stf0), with conversion levels of 92±3% and 86±5%,respectively. This result is in contrast to previous sulfatase studiesthat indicated the distal threonine-glycine-arginine (TGR) sequence tobe important for efficient cysteine oxidation (Dierks et al. (1999) EMBOJ. 18(8):2084-91). The hGH peptide containing the 6 residue sulfataseconsensus motif (ald₆-hGH) demonstrated significantly higher conversionat 99 ±5%.

Example 3 Conversion of Sulfatase Motifs by FGE is Independent of thePrimary Sequence Context, thus Allowing for Positioning of Aldehyde Tagsat either N-Terminal or C-Terminal Positions within a Polypeptide

Previous sulfatase studies have indicated the distalthreonine-glycine-arginine (TGR) sequence in the 13 residue sulfataseconsensus motif (underlined below) to be important for high-levelcysteine oxidation:

LCTPSRGSLFTGR

Because formation of FGly is thought to occur co-translationally (Dierkset al. Proc Natl Acad Sci USA. 1997 Oct 28;94(22):11963-8), it wasreasoned that C-terminal constructs might experience lower FGlyformation due to inaccessibility of the aldehyde tag. This was tested bygenerating a C-terminally tagged polypeptide containing maltose-bindingprotein and the 6 residue sulfatase consensus motif (ald₆-MBP):

LCTPSR-(maltose binding protein)

Surprisingly, the C-terminally tagged ald₆-MBP also demonstrated nearlyquantitative conversion at 99±2%. Considering that the sulfatase motifis natively found within the interior of sulfatases, these resultsindicate that aldehyde formation is not limited with respect to thetag's primary sequence position.

Example 4 Selective Flourescent Labeling of Aldehyde Tagged Proteins

To demonstrate the specificity afforded by FGly introduction, a panel ofaldehyde tagged proteins was labeled with ALEXA FLUOR® 647aminooxyacetamide dye (Invitrogen):

ald₁₃-Stf0 (C5A): LATPSRGSLFTGR-(mycobacterial sulfotransferase)

ald₁₃-Stf0: LCTPSRGSLFTGR-(mycobacterial sulfotransferase)

ald₆-Stf0: LCTPSR-(mycobacterial sulfotransferase)

ald₆-MBP: LCTPSR-(maltose binding protein)

Aldehyde-tagged proteins demonstrated robust fluorophore labeling (FIG.3, panel b): ald₁₃-Stf0 (−), ald₆-Stf0 (−), and ald₆-MBP (−)). Incontrast, control proteins in which the critical cysteine in thealdehyde tag motif was mutated to alanine demonstrated only a smallamount of background labeling (FIG. 3, panels c and d: ald₁₃-Stf0 (C5A)(−)). Aldehyde-tagged proteins incubated with an excess ofmethoxylamine, a competing nucleophile, demonstrated no labeling (FIG.3, panels c and d, ald₁₃-Stf0 (C5A) (+), ald₁₃-Stf0 (+), ald₆-Stf0 (+),and ald₆-MBP (+)). Interestingly, although E. coli's genome does notcontain an annotated FGE, aldehyde-tagged protein expressed withoutexogenous FGE still demonstrated fluorescent labeling, albeit with lowerintensity. This indicates that E. coli must natively express an enzymeor enzymes that are capable of oxidation of the sulfatase motif.

Example 5 Modification of an Aldehyde Tagged Proteins can Provide for“Switchable ” Moieties

Aldehydes readily react with hydrazide and aminooxy moieties to yieldhydrazones and oximes, respectively. Although both of these conjugatesare robust under physiological conditions, oximes are morethermodynamically stable. This difference can be exploited for switchingthe lower stability hydrazone conjugate to the more stable oximeconjugate. This feature of the aldehyde tag allows the modification ofthe target protein with two reagents in sequence (i.e., sequentially),as exemplified by conjugation of a purification tag followed byreplacement of the conjugated purification tag to provide a conjugatedfluorophore.

To assess the feasibility of this technique, a polypeptide containingmaltose binding protein and the 6 residue sulfatase consensus motif(ald₆-MBP) was first labeled with biotin hydrazide and subsequentlyincubated with methoxylamine or an aminooxy epitope tag (aminooxy-FLAG).Labeling with biotin hydrazide led to a robust signal by α-biotin in awestern blot (FIG. 4, panel a, lane 1). Subsequent incubation withmethoxylamine or aminooxy-FLAG led to a complete loss of α-biotin signal(FIG. 4, panel a, lane 2) or a robust α-FLAG signal (FIG. 4, panel a,lane 3), respectively.

When the aminooxy-FLAG labeled protein was subsequently exposed tomethoxylamine, only partial loss of signal was observed (data notshown), presumably due to the similar stabilities of the conjugates.These results indicate that sequential conjugation to an aldehyde-taggedprotein can be programmed based on stability of the linkage chemistry.

Example 6 Creation of Site-Specific Pegylation to Produce Peg-ProteinConjugates in a Therapeutic Target Protein

To illustrate the use of aldehyde tags in mediating site-specificPEGylation, aldehyde tags were used to site-specifically attachpolyethylene glycol (PEG) chains to recombinantly expressed ald₆-Stf0.ald₆-Stf0 was recombinantly expressed and modified it with a series ofaminooxy-PEGs with varying chain lengths. SDS-PAGE analysis of theStf0-PEG conjugates demonstrated unambiguous mass shifts consistent withthe molecular weight and charge of the appended PEG molecules (FIG. 4,Panel b). These results demonstrate the ease of obtaining site-specificPEG-protein conjugates regardless of the number of native cysteines orlysines.

The above provides proof of principle for application of aldehyde tagsto mediate site-specific PEGylation of, for example, therapeuticproteins. PEGylation of pharmaceutical proteins is desirable as it canincrease therapeutic index by increasing proteolytic stability anddecreasing renal clearance. Additionally, PEGylation can be exploited toreduce immunogenicity of protein pharmaceuticals. Despite advances inprotein conjugation chemistries, site-specific modification of proteinsremains problematic. Derivatization of cysteine or lysine residues iscurrently the most utilized method to PEGylate proteins, but thisnon-specific labeling method results in PEGylation of multiple sites,creating an undesirable collection of discrete protein-PEG conjugateswith different pharmacokinetics. The aldehyde tag technology describedherein can be used to address needs such as these.

Example 7 Use of Aldehyde Tags to Modify Cell Surface AccessibleResidues of a Polypeptide Expressed in Mammamlian Cells

To demonstrate the introduction of FGly into a protein integral to thecell membrane, an aldehyde-tagged synthetic photoisomerizableazobenzene-regulated K+ (SPARK) channel protein was produced. SPARKchannel proteins, which are light-activated K+ ion channels, weredeveloped for non-invasive control of neuronal activity (Banghart et al.Nat. Neurosci. 2004).

The 6 residue aldehyde tag described above (ald₆ (LCTPSR)) wasintroduced into a construct encoding a SPARK channel protein. Threestrategies were used: 1) adding the 6 residue sulfatase consensus motifof the ald-tag within one of the protein's extracellular loops (referredto as “I” in FIG. 5); 2) deleting 6 residues from the loop and thenreplacing these residues with the 6 residue ald-tag (referred to as “C”in FIGS. 5), and 3) deleting 3 residues from the loop and then addingthe 6 residue ald-tag (referred to as “P” in FIG. 5). A vector-onlynegative control was also run (“V” in FIG. 5).

Plasmids encoding each of the three variants of the recombinant,aldehyde-tagged SPARK channel were transfected into Chinese hamsterovary (CHO) cells and into human embryonic kidney (HEK) cells. Both CHOand HEK cells express an endogenous FGE. However, in order to increaseconversion of the Cys of the aldehyde tag, a plasmid encoding thealdehyde tagged SPARK was co-transfected with a pcDNA3.1 constructencoding human FGE. The human FGE used has the amino acid sequence:

maapalglvc grcpelglvl lllllsllcg aagsqeagtgagagslagsc gcgtpqrpga hgssaaahry sreanapgpvpgerqlahsk mvpipagvft mgtddpqikq dgeaparrvtidafymdaye vsntefekfv nstgylteae kfgdsfvfegmlseqvktni qqavaaapww lpvkganwrh pegpdstilhrpdhpvlhvs wndavayctw agkrlpteae weyscrgglhnrlfpwgnkl qpkgqhyani wqgefpvtnt gedgfqgtapvdafppngyg lynivgnawe wtsdwwtvhh sveetlnpkgppsgkdrvkk ggsymchrsy cyryrcaars qntpdssasn lgfrcaadrl ptmd

After one day, the cells were lysed and the lysate was probed by Westernblot for presence of a myc epitope (which is present in the SPARKchannel protein, and thus demonstrates successful transfection andtranslation) and for the presence of the aldehyde by reaction with usingaminooxy-FLAG, followed by probing with an anti-FLAG antibody. A ponceaublot demonstrated equal loading of samples from the same cell type onthe blot.

As shown in FIG. 5, the strategy involving deletion of 6 residues of theSPARK extracellular loop and replacement with the 6-residue ald-tag wassuccessful (see the arrow on the anti-FLAG blot in FIG. 5, panel c).This result demonstrates the ability to modify cell surface residues ofan aldehyde-tagged protein in mammalian cells.

The presence of the FLAG of the aldehyde tagged SPARK on the surface ofthe cell can be confirmed using flow cytometry.

Example 8 Use of Aldehyde Tags to Modify FC Antobody Fragment

In order to further demonstrate applications of aldehyde tags, a solubleIgG Fc fragment was modified to contain an aldehyde tag at either the N-or C-terminus. Briefly, a 13-residue aldehyde tag (ald₁₃)(LCTPSRAALLTGR) was introduced so as to position the aldehyde tag ateither the N-terminus of the C-terminus of the soluble IgG Fc fragmentencoded in the commercially available pFuse-Fc vector (Invitrogen). Inorder to increase conversion of the Cys of the aldehyde tag, CHO cellswere co-transfected with the Fc encoding construct and a pcDNA3.1construct encoding human FGE.

Fc fragments were isolated from cell supernatant, and detection of thealdehyde-tagged IgG Fc fragment in which the Cys was converted to FGlywas accomplished by reacting the isolated protein with an aminooxy-FLAG(DYKDDDDK) probe (FLAG-ONH₂), followed by SDS-PAGE and Western-blotanalysis. Proteins that were not reacted with the FLAG-ONH₂ probe servedas an additional control.

Whereas Fc fusions containing the aldehyde tag 12 mer gave robustlabeling when present at either the N-terminus (N-Fc-Ald13)or C-terminus(C-Fc-Ald13), the control protein, in which the critical cysteine hadbeen mutated to alanine (C to A mutation), gave no detectable signal(FIG. 17).

In order to assess whether a 6 mer aldehyde tag is sufficient to mediatemodification of a protein, IgG Fc fragments having a 6 mer aldehyde tag(Fc-Ald) or a control tag (Fc-C→A) at the C-terminus were generatedusing the pFuse-Fc vector. Aldehyde tagged IgG Fc fragments weredetected by reacting the isolated protein with an aminooxy-FLAG probe(FLAG-ONH₂), followed by SDS-PAGE and Western blot. Proteins that werenot reacted with the FLAG-ONH₂ probe served as an additional control. Asshown in FIG. 18, the 6 mer aldehyde tag facilitated robust labeling ofthe Fc-Ald, while no detectable labeling was observed with Fc fragmentsmodified to include the control tag. Constructs encoding IgG Fcfragments having the 6 mer aldehyde tag position at the N-terminusyielded similar results (data not shown).

In order to confirm formylglycine (FGly) modification of the Fcfragments, N- or C-terminally tagged ald13-Fc fragments were subjectedto tryptic digestion to allow for direct mass spectral identification ofFGly. As shown in FIGS. 19 and 20, the FGly-containing peptide and thecysteine containing peptide could be easily identified from bothN-terminally and C-terminally modified Fc fragments.

Specific labeling of the aldehyde-tagged Fc was also realized bysubjecting the serum-free medium directly to the aminooxy-FLAG probe(data not shown).

Example 9 Efficiency of Conversion of Cys to FGly in Aldehyde TaggedProteins

In order to quantify the extent of conversion from Cys to FGly, an assaywas developed to analyze conversion efficiency of trypsin-digestedtarget proteins. The quantity of the unmodified peptide containingcysteine was determined from a standard curve, which was produced bydoping synthetic peptides into tryptic digests at variousconcentrations. The quantity of the FGly-containing peptide wascalculated by subtracting the quantity of the cysteine-containingpeptide from the total protein quantity, determined using BCA proteinassay.

When this assay was applied to the N- and C-terminally tagged Fcfragment described in the Example above, it was found that in thepresence of exogenous human FGE (hFGE) the efficiency of conversion fromCys to FGly was 86 ±1% for the N-terminally tagged ald13-Fc and 58±2%for C-terminally tagged Fc-ald13. In contrast, in the absence ofexogenous hFGE, the efficiency of conversion was only about 25% andabout 23% for N- and C-terminally tagged Fc fragment, respectively.C-terminally modified Fc fragment containing a 6 mer aldehyde tagexhibited a conversion efficiency of about 92% in the presence ofexogenous hFGE.

Example 10 Aldehyde Tag-Mediated Modification of Cell Surface Proteins

This example demonstrates that aldehyde tags can be used to facilitatesite-specific modification of cell surface proteins, the plateletderived growth factor receptor (PDGFR) transmembrane domain (encoded bypDisplay vector from Invitrogen), in live HEK cells using the sameapproach.

The 13 mer aldehyde tag (LCTPSRAALLTGR) or a control tag (LATPSRAALLTGR)was introduced into a pDisplay™ expression construct (Invitrogen; FIG.21) between Bgl II and Sal I Sites. The resulting fusion proteins arereferred to here as Ald13-TM (containing the 13 mer aldehyde tag) andC→A-TM (containing the control tag). This expression construct and aconstrued expressing human FGE (hFGE) were transiently transfected intoHEK293-T cells to provide for expression.

Labeling of cells was accomplished by reacting with an oxyamino biotinand probed by streptavidin Alex fluro 488 conjugates. The cells werethen subjected to analysis by flow cytometry.

As illustrated in FIG. 21, the mean fluorescence of cells expressing theAld13-TM surface protein was significantly higher (mean fluorescenceabout 24.42) than cells expressing the C→A-TM control (mean fluorescenceabout 3.31).

Example 11 Aldehyde Tag Modification for Labeling of Cytosolic Protein

To illustrate the use of the aldehyde tag in specific labeling ofcytosolic proteins, constructs encoding aldehyde tagged or controltagged green fluorescent protein derived from Aequorea coerulescens(AcGFP) were generated. Using the commercially available pAcGFP1-N1vector (Clontech), an expression construct encoding an AcGFP fusionprotein composed of a His tag (six histidine residues, represented byHis₆) followed by 13 mer aldehyde tag (LCTPSRAALLTGR) or a control tag(LATPSRAALLTGR) positioned at the N terminus of AcGFP was generated byinsertion of the His tag and 13 mer aldehyde tag coding sequencesbetween the Kpn I and Xma I restriction sites.

A bacterial FGE homolog derived from Streptomyces coelicolor (StrepFGE)was cloned into a mammalian expression vector (pcDNA 3.1, Invitrogen)for cotransfection of HEK cells with a plasmid encoding aldehyde taggedGFP (Ald-AcGFP) or control tagged GFP (C→A-AcGFP). Cells lackingexpression of StrepFGE were used as a further control.

Detection of the aldehyde-tagged AcGFP that contained FGly wasaccomplished by reacting the isolated protein with an aminooxy-FLAG(DYKDDDDK) probe (FLAG-ONH2), followed by SDS-PAGE and Western-blotanalysis. Proteins that were not reacted with the FLAG-ONH₂ probe servedas an additional control.

In the presence of the cytosolic FGE homolog, the cysteine residuewithin the consensus sequence was efficiently converted to aformylglycine (FGly) (FIG. 22), while control tagged AcGFP did notexhibit detectable labeling indicating no detectable FGly. In addition,Ald-GFP produced in HEK cells that did not express StrepFGE alsoproduced a strong signal (FIG. 22). This may be due to the method ofprotein isolation used in which the HEK cells are lysed, and thus mayfree the hFGE from the ER of these cells, thus allowing for contactbetween the hFGE and the aldehyde tag resulting in cysteine conversionto the aldehyde.

Example 12 Aaldehyde Tag Modification of IFN-Beta

Aldehyde tags can be used to facilitate modification of a variety ofproteins. Exemplary proteins of interest for modification includeinterferon beta (IFN-beta). IFN-beta is composed of five alpha-helices(A-E) with a single glycosylation site existing at residue Asn-80.IFN-beta can be modified to provide for modification at theglycosylation site and/or at other solvent accessible sites of theprotein. For example, the amino acid sequence of IFN-beta thatfacilitates glycosylation can be modified so as to provide an aldehydetag. For example, using recombinant techniques, the IFN-beta sequenceDSSSTGWNE present in a loop of IFN-beta can be replaced with thealdehyde tag-containing sequence GSLCTPSRG. The aldehyde tag can then beexploited to attach a moiety of interest, as exemplified in FIG. 23.

Examples 13-14 Identification and Characterizatioin of an FGE rom M.Tuberculosis

In the following Examples, a prokaryotic FGE is functionally identifiedin Mycobacterium tuberculosis (Mtb). As discussed above, sulfatases aremembers of an expanding family of enzymes that employ novel co- orpost-translationally derived cofactors to facilitate catalysis, andcontain an active site FGly residue. The FGly residue is thought toundergo hydration to the gem-diol, after which one of the hydroxylgroups acts as a catalytic nucleophile to initiate sulfate estercleavage (FIG. 7, Panel a). The FGly residue is located within asulfatase consensus sequence, which defines the sulfatase family ofenzymes and is highly conserved throughout all domains of life (FIG. 7,Panel b). Whereas FGly is formed from cysteine residues in eukaryoticsulfatases, either cysteine (within the core motif CXPXR) or serine(SXPXR) can be oxidized to FGly in prokaryotic sulfatases. Someprokaryotes, such as Mtb, encode only cysteine-type sulfatases, whereasother species have only serine-type sulfatases or a combination of both.

Examples 8-9 describe characterization of a prokaryotic FGE from Mtb andsolved the structure of the ortholog from Strep. Our studies indicatethat FGE-activated sulfatases account for approximately half of thetotal sulfatase activity in Mtb lysate, suggesting that this organismpossesses FGE-independent sulfatases that have yet to be identified.Defining the complete repertoire of sulfatases from Mtb (and otherprokaryotes) is an important future goal and will provide the platformfor defining these enzymes' role in the organism's lifecycle andpathogenesis.

Methods and Materials

The following methods and materials were used in the Examples relatingto identification of an FGE in Mycobacterium tuberculosis (Mtb), andproduction of an FGE-deficient Mtb strain.

Preparation of protein expression vectors. The table below lists theoligonucleotides used in the examples below. The gene encoding Mtb FGE(Rv0712, encoding residues 2-299) was amplified from Mtb H37Rv genomicDNA and cloned into pET14b Novagen) using NdeI and XhoI restrictionsites. The gene encoding Strep FGE (SC07548, encoding residues 2-314)was amplified from Strep A3(2) genomic DNA and cloned into pET151/D-TOPO(Invitrogen). Open reading frames Rv2407 (encoding residues 2-273),Rv3406 (encoding residues 2-295), and Rv3762c (encoding residues 2-626)were amplified from Mtb H37Rv genomic DNA. Rv2407 was ligated intopMAL-C2X (New England Biolabs) using BamHI and PstI restriction sites,and both Rv3406 and Rv3762c were ligated into pET28b (Novagen) usingNdeI and XhoI restriction sites. DNA sequencing was performed to confirmthe fidelity of each gene product. Protein-encoding plasmids weretransformed into BL21(DE3) cells (Invitrogen).

Oligonucleotide primers Primer Sequence (5′ → 3′) Mtb fge StartCTATGCTACATATGCTGACCGAG TTGGTTGACCTGC Mtb fge EndTAGCATAGCTCGAGCTACCCGGA CACCGGGTCG Strep fge StartCACCGCCGTGGCCGCCCCGTCCC C Strep fge End TCACTCAGCGGCTGATCCGGMtb Rv2407 Start CTATGCTAGGATCCCTTGAGATC ACGTTGCTCGG Mtb Rv2407 EndCTATGCTACTGCAGCTAGCGCCG CGGGTGCACCTC Mtb Rv3406 StartCTATGCTACATATGACAGATCTG ATTACCGTGAAG Mtb Rv3406 EndCTATGCTACTCGAGTCAGCCAGC GATCTCCATCG Mtb Rv3762c StartCTATGCTACATATGCCGATGGAA CACAAACCTCC Mtb Rv3762c EndCTATGCTACTCGAGCTACGGCGT CACGATGTTGAAG Mtb fge Ser260AlaaGACCCTCAAGGGCGGCGCACACC TGTGCGCGCCG Mtb fge Cys263SeraTCGCACCTGAGCGCGCCGGAGTA CTGC Mtb fge Cys268Sera GCGCCGGAGTACAGCCACCGCTACCGC Strep fge Trp234Alaa CACCGCGGGCAACGTGGCGGAAT GGTGCTCCGACStrep fge Trp234Phea CACCGCGGGCAACGTGTTTGAAT GGTGCTCCGACStrep fge Cys272Sera GGCGGCTCCTACCTGTCCCACGA CTCCTACTGCStrep fge Cys277Sera GTGCCACGACTCCTACTCCAACC GCTACCGGGTCGMtb Δfge upstream 5′ CTATGCTAAAGCTTGAATCGAGT GAGATATTGCCMtb Δfge upstream 3′ TAGCATAGTCTAGAATGACGCTC GATCGAGAACGMtb Δfge downstream 5′ CTATGCTATCTAGATCCTCACAG TCGCAGGACAGCMtb Δfge downstream 3′ TAGCATAGTTAATTAATGCACCA TCTCGTTGCTCTCG aNumberedfrom the beginning of the respective FGE start codon. A pair ofcomplementary primers was used for each mutant. Reverse complements arenot shown; changes to the sequence are underlined.

Site directed mutagenesis. Site-specific mutations in Mtb FGE and StrepFGE were produced using QuikChange PCR mutagenesis kit (Stratagene).pET14b Mtb FGE and pET151 Strep FGE plasmids and the appropriateoligonucleotides from the table above were used in the mutagenesisreactions. Mutations were confirmed by DNA sequencing and plasmids weretransformed into BL21(DE3) cells for protein expression as describedbelow.

Protein expression and purification. Clonal populations of BL21(DE3)cells harboring a His₆-tagged protein-encoding plasmid were incubated inLB media with ampicillin or kanamycin with shaking at 37° C. untilOD₆₀₀=0.5, at which time the temperature was lowered to 18° C. and 250μM IPTG was added. After 12-16 h, cells were harvested and resuspendedin 20 ml of lysis buffer (50 mM Tris, 500 mM NaCl, 10% glycerol, 20 mMimidazole, 1 mM DTT, 1 mM TCEP, 1 mM methionine, pH 7.5) per liter ofculture and lysed by sonication. Cell lysate was treated with DNase (10μg/ml), cleared by centrifugation and applied to a 1 ml HisTrap column(GE Healthcare). The column was washed with lysis buffer with 35 mMimidazole and His₆-tagged protein was eluted using lysis buffer with 250mM imidazole. The elution volume was concentrated to less than 2 ml ifnecessary and further purified on a Sephadex 16/60 5300 column (GEHealthcare). Purified recombinant protein was subsequently concentratedto about 20 mg/ml.

The identity and purity of Mtb and Strep FGE was assessed byelectrospray ionization mass spectrometry (Bruker/Agilent Esquire).Rv2407 was not soluble in His₆-tagged form and was alternatively fusedto maltose binding protein (MBP). Growth and lysis conditions forMBP-Rv2407 producing cells were the same as above except with theabsence of imidazole in the lysis buffer. Cleared lysate was applied toamylose resin (New England Biolabs) in lysis buffer, washed inadditional lysis buffer, and MBP-Rv2407 was eluted in lysis buffer with10 mM maltose and subsequently concentrated. MBP was cleaved and removedfrom Rv2407 using Factor Xa (New England Biolabs) and amylose resin,respectively.

Strep FGE crystallization. Attempts to crystallize FGE homologs fromMtb, Mycobacterium smegmatis and Mycobacterium avium were not successfuldue to protein instability. Strep FGE was dialyzed into 10 mM Tris pH7.5, 150 mM NaCl, and 1 mM TCEP. Crystals of His₆-tagged Strep FGE wereobtained using vapor diffusion by mixing 1 μl of dialyzed protein with 1μl of crystallization solution (100 mM Tris pH 8.0, 2.4 M ammoniumformate, 0.3% β-octylglucoside, 3.2% 2-butanol) at room temperature(RT). Crystals grew over a period of two weeks and were subsequentlytransferred to cryoprotectant consisting of crystallization solutionwith 20% glycerol.

Strep FGE structure determination. Data were collected at beamline 8.2.2at the Advanced Light Source using an ADSC Quantum-Q315 CCD detector.Diffraction data were processed using HKL2000 (Otwinowski et al. (1997)Methods Enzymol: Macromol Crystallogr Part A 276, 307-326) Initialphases were determined by molecular replacement using the human FGE (PDBentry 1Y1E) as a search model in PHASER (Storoni et al. (2004) ActaCrystallogr D Biol Crystallogr 60, 432-8). The asymmetric unit containedfive Strep FGE monomers in space group P3₁2₁. Initial stages of modelrefinement included cycles of simulated annealing with torsion angledynamics and restrained B-factor refinement using CNS (Brunger et al.(1998) Acta Crystallogr D Biol Crystallogr 54, 905-21), followed bymanual model rebuilding using O (Jones et al. (1991) Acta Crystallogr A47 (Pt 2), 110-9). The final cycles of refinement were carried out withTLS (Winn et al. (2001) Acta Crystallogr D Biol Crystallogr 57, 122-33)restraints as implemented in REFMACS (Murshudov (1997) Acta CrystallogrD Biol Crystallogr 53, 240-55) using 5 TLS groups (corresponding to eachFGE monomer in the asymmetric unit). Water molecules were added withARP/WARP (Lamzin et al. (1993) Acta Crystallogr D Biol Crystallogr 49,129-47). The final model contained residues 18-305 in monomer A,residues 19-306 in monomer B, residues 20-306 in monomer C, residues19-305 in monomer D, and residues 19-307 in monomer E. Final R_(work)and R_(free) values were 19.5% and 23.3%, respectively. Data collectionand processing statistics are summarized in the table below. All figureswere generated with PyMOL (www.pymol.org).

Data collection Resolution (Å)^(a) 20-2.1 (2.1-2.17) Wavelength (eV)12,398.4 Space group P3121 Unit cell dimensions (a = b, c) (Å) 142.444,217.067 Measured reflections 123276 Completeness (%) 83.4 (88.2)Redundancy 2.6 (2.6) Mosaicity (°) 0.32 I/σ 15.8 (1.9) Rsym (%)b 5.7(23.3) Refinement Rwork (%)c 19.5 Rfree (%)c 23.3 Number ofresidues/waters 1438/1017 Rms bonds (Å)/angles (°) 0.008/1.062Ramachandran plot (%)d 87.9/11.2/0.5/0.6e Average B values 41.5^(a)Values in parentheses correspond to the highest resolution bin.bR_(sym) = 100*Σ_(h)Σ|l_(i)(h) − <l(h)>|/Σ_(h)Σ_(i)l_(i)(h), wherel_(i)(h) is the i_(th) measurement of reflection h and <l(h)> is theaverage value of the reflection intensity. cR_(work) = 100*Σ∥F_(obs)| −|F_(calc)∥/|F_(obs)|, where F_(obs) and F_(calc) are the structurefactor amplitudes from the data and the model, respectively. R_(free) isR_(work) with 5% of the reflections set aside throughout refinement.dNumbers correspond to the percentage of amino acid residues in thefavored, allowed, generously allowed and disallowed regions,respectively. Calculated using PROCHECK₃₉. eSeven residues were observedin stereochemically strained conformations either due to crystal packingcontacts (Tyr219 in monomers A and C) or hydrogen bonding interactions(Asn232 in monomers A-E).

FGE activity assay. Wild-type and mutant FGE from Mtb and Strep werepurified as described above. The peptide substrate was synthesized bystandard Fmoc solid-phase synthesis methods and consisted of the13-residue sequence LCSPSRGSLFTGR, a sulfatase consensus motif. TheN-terminus was acetylated, the C-terminus was amidated and the sequencewas confirmed by mass spectrometry. Assay conditions were similar tothose reported previously by Dierks et al. in studies of human FGE(Dierks et al. (2003) Cell 113, 435-4). Anaerobic experiments wereperformed in the same manner except solutions were made anaerobic usingan oxygen-scavenged gas manifold and reactions were started by mixingenzyme with substrate in an anaerobic glovebox. EDTA was added to theappropriate reactions at a concentration of 100 mM. Confirmation of FGlyformation was performed by incubating 1 μl of desalted product with 1 μlof 5 mM biotin hydrazide (Sigma) for 30 min t RT. Samples were mixed 1:1(v/v) with matrix solution (10 mg/ml α-cyano-4-hydroxy-cinnamic acidwith 2 mM ammonium citrate) and analyzed by matrix-assisted laserdesorption/ionization-time of flight mass spectrometry (AppliedBiosystems Voyager DE Pro).

Metal detection. A multi-element standard solution was prepared byappropriate dilution of ICP standards of Ca, Cu, Fe, Mn, Mg, and Zn(Sigma). Metal content of Mtb and Strep FGE were analyzed by ICP-AESusing a Perkin Elmer Optima 3000 DV. Absence of Fe, Cu and Zn in StrepFGE was confirmed at beamline 8.3.1 at the Advanced Light Source.Absorption edges of these metals were examined using a double crystalmonochrometer and the beamline's x-ray fluorescence detector.

Mtb FGE-deficient strain production. An unmarked, in-frame geneticdeletion of the FGE-encoding open reading frame Rv0712 was created inMtb H37Rv using allelic replacement (Parish et al. (2000) Microbiology146 (Pt 8), 1969-75.). A 2-kb region upstream of Rv0712 was amplifiedand inserted into the mycobacterial delivery vector p2NILX betweenHindIII and XbaI restriction sites. p2NILX is derived from pNIL (Parishet al. (2000) Microbiology 146 (Pt 8), 1969-75) and modified with theaddition of an XbaI restriction site between KpnI and NotI restrictionsites. A 2-kb region downstream of Rv0712 was amplified and insertedinto p2NILX between XbaI and PacI restriction sites. Selection markerslacZ and sacB were digested from pGOAL17 and ligated into p2NILX usingthe Pad restriction site. The completed delivery vector was treated withUV light (120 mJ cm⁻²) and electroporated into electrocompetent MtbH37Rv as previously described (Hatfull, G. F. & Jacobs, W. R. J. (eds.)Molecular Genetics of Mycobacteria (ASM Press, Washington, D.C., 2000)).Selection of the mutant was performed as previously described (Parish etal. (2000) Microbiology 146 (Pt 8), 1969-75), and genotype was confirmedby Southern analysis (FIG. 9). The complemented strain was produced bytransforming the Δfge strain with the integrating vector pMV306.kancontaining the entire Rv0712 open reading frame under the control of theglutamine synthase promoter.

Sulfatase/phosphatase assay. Mtb H37Rv strains were grown in 7H9 mediasupplemented with ADC (Becton Dickinson) at 37° C. until OD₆₀₀=1.0.Cells were lysed by mechanical disruption using 0.1 mm zirconia beads(FastPrep, MP Biomedicals) and the crude lysate was cleared bycentrifugation and filtered through a 0.22 pm membrane. Cleared lysatesamples were normalized for total protein concentration (Biorad AC/DCprotein assay kit) and 50 μg of lysate protein was added to buffer (50mM Tris pH 7.5, 500 mM NaCl, 100 μM MgCl₂, 100 μM MnCl₂, 100 μM CaCl₂),protease inhibitors (Protease Inhibitor cocktail set III, EMDBioscience), and 8 mM 4-methylumbelliferyl sulfate (4MUS). Limpetsulfatase (Sigma) was used at a final concentration of 1 μg/ml as apositive control. Reactions were incubated at 37° C. for 3 h and stoppedby adding 4 volumes of 0.5 M Na₂CO₃/NaHCO₃ pH 10.5. Sulfatase activitywas measured using a fluorimeter (Gemini XL, Molecular Devices) usingexcitation and emission wavelengths of 360 nm and 460 nm, respectively.Sulfatase/phosphatase inhibitors were used per manufacturer'sinstructions and included microcystin, cantharidin, p-bromotetramisole,sodium vanadate, sodium molybdate, sodium tartrate, and imidazole(Phosphatase Inhibitor Cocktail 1 & 2, Sigma). Sulfatase activity ofrecombinant Rv2407, Rv3406 and Rv3762c was determined using the sameconditions mentioned above, with the addition of 1 mM α-ketoglutarate,200 μM ascorbate and 100 μM FeCl₂ to the buffer. Phosphatase activitywas monitored as described above except with the substitution of4-methylumbelliferyl phosphate for 4MUS.

NBD labeling. His₆-tagged Strep FGE was treated with 1:50 (w/w) TEVprotease to remove the N-terminal His₆-tag before NBD labeling and massspectrometric analyses. Strep FGE (45 μM) was incubated in buffer (25 mMpotassium phosphate pH 7.0, 150 mM NaCl) with 1 mM4-chloro-7-nitrobenz-2-oxa-1,3-diazole (NBD-Cl, Invitrogen) for 30 minat RT (Ellis et al. (1997) Biochemistry 36, 15013-8). The sample wasdesalted by C₁₈ reversed-phase chromatography and protein-NBD adductswere detected by mass spectrometry (Bruker/Agilent Esquire). Mapping ofNBD adducts was performed by digesting NBD-reacted Strep FGE with 1:50(w/w) trypsin, desalting by C₁₈ reversed-phase chromatography andanalyzing the resulting peptide fragments using electrospray ionizationFourier-transform ion cyclotron resonance mass spectrometry (Bruker 9.4TApex III).

Example 13 Identification and Cloning of an FGE of M. Tuberculosis

The Mycobacterium tuberculosis (Mtb) H37Rv open reading frame Rv0712 wasidentified by BLAST analysis (Altschul, et al. (1997) Nucleic Acids Res25, 3389-402) to be over 30% identical to the human FGE SUMF1 (Cosma etal. (2003) Cell 113, 445-56 (2003); Dierks et al. (2003) Cell 113,435-44). Recombinant Rv0712 was able to modify a synthetic peptidecontaining the sulfatase motif as determined by mass spectrometry (FIG.8, Panel a). The presence of FGly within the substrate was confirmed bytreating the modified peptide with biotin hydrazide, which formed acovalent adduct with the peptide via hydrazone formation (FIG. 8, Panelb). Together these data implicate Rv0712 as Mtb's FGE.

Similar to the human genome, the Mtb genome appears to encode only onefunctional copy of FGE. Therefore, it was expected that disruption ofRv0712 in Mtb to produce a sulfatase-deficient strain. Rv0712 wasdisrupted in Mtb H37Rv using homologous recombination and confirmed bySouthern analysis (FIG. 9). Δfge Mtb was viable and demonstrated noobvious growth defects in vitro.

Sulfatase activity of the Δfge strain was compared to that of wild-type(WT) H37Rv and to the Δfge mutant into which FGE expression was restoredby complementation. Crude lysates were generated from these three Mtbstrains and global sulfatase activity was determined using the generalsubstrate 4-methylumbelliferyl sulfate (4MUS). The Δfge strain exhibiteda substantial, yet surprisingly incomplete loss of sulfatase activity(FIG. 8, Panel c). While it may have been possible that the residualsulfatase activity resulted from phosphatases acting on 4MUS, whensulfatase activity was monitored in the presence of a cocktail of broadspectrum sulfatase/phosphatase inhibitors, Δfge was not affected.Indeed, activity in lysates from WT and complemented Δfge was reduced byabout 40% in the presence of the inhibitor cocktail, matching thesulfatase activity of Δfge in the absence of inhibitors (FIG. 8, Panelc). Because the applied inhibitors are known to inhibit FGE-activatedsulfatases (Stankiewicz et al. (1988) Biochemistry 27, 206-12), thesedata suggest that Mtb possesses FGE-independent sulfatases.

To further verify that promiscuous phosphatases were not responsible forthe residual sulfatase activity, phosphatase activity of crude lysatesfrom each strain was monitored using 4-methylumbelliferyl phosphate. Allthree strains exhibited the same level of phosphatase activity in theabsence of inhibitors, but activity was abolished in all strains in thepresence of the inhibitors (FIG. 8, Panel d). These data furtherindicate that phosphatases are not accountable for the residual 4MUShydrolysis activity observed in the Δfge strain and that FGE-activatedsulfatases are responsible for approximately 40% of the total sulfataseactivity in Mtb lysate.

The Mtb genome was searched for potential sources of FGE-independentsulfatase activity. The majority of known or putative prokaryoticsulfatases are homologous to eukaryotic sulfatases and contain thesulfatase motif. However, some prokaryotes also have FGE-independentsulfatases that do not require FGly and presumably operate via differentmechanisms. These enzymes may not be sensitive to broad-spectrumsulfatase/phosphatase inhibitors. FGE-independent sulfatases are nothomologous to FGE-activated sulfatases and have been classified into oneof two enzyme families, the metallo-β-lactamases and Fe(II)α-ketoglutarate-dependent dioxygenases₁₈₋₂₀. Based on sequencesimilarity with known FGE-independent sulfatases from other prokaryotes,Mtb has at least three putative FGE-independent sulfatases encoded byopen reading frames Rv2407, Rv3406 and Rv3762c. Recombinant forms ofRv2407, Rv3406 and Rv3762c were expressed in E. coli, but the purifiedproteins exhibited no activity in the 4MUS assay, indicating that theseputative sulfatases are probably not responsible for the residualsulfatase activity in Δfge Mtb (FIG. 10). Considering the lack ofsequence similarity among FGE-independent sulfatases, Mtb may have othersulfatases not detectable by BLAST analysis.

Example 14 Structure of Mtb FGE

In order to understand better the unique enzymatic mechanism andsubstrate binding characteristics of prokaryotic FGEs, the structure ofthe Mtb FGE ortholog from Streptomyces coelicolor (Strep) was determinedto a resolution of 2.1 Å. The overall topology of the bacterial FGE isremarkably similar to the recently determined human FGE structure(Dierks, et al. (2005) Cell 121, 541-52). (FIG. 11A). Similar to humanFGE, Strep FGE has low secondary structure content, containing 16%α-helix and 12% β-sheet. Both share the novel “FGE fold,” but the StrepFGE variant contains only one Ca²⁺ ion as determined by coordinationgeometry and inductively coupled plasma-atomic emission spectroscopy(ICP-AES) (FIG. 12). The human variant is stabilized by two Ca2+ ions;this difference is apparently due to a Glu66Ala substitution in StrepFGE that disrupts an appropriate coordination environment (FIG. 3, panelb). ICP-AES data indicate that Mtb FGE lacks both Ca²⁺ ions (FIG. 12),suggesting that the FGE fold does not require stabilization by divalentcations.

The active sites of the prokaryotic and human FGE are remarkablysimilar. Both are approximately 20 Å in length, 12 Å in width, and 10 Åin depth and can accommodate only 6 of the 13 amino acids that definethe sulfatase motif. Considering that the sulfatase motif extendstowards the C-terminus of the peptide substrate for another eightresidues beyond the core consensus sequence (CXPXR) (FIG. 7, panel b),it is possible that FGE has evolved a secondary binding region to aid insubstrate recognition, similar to other proteins such as thrombin andbotulinum neurotoxin (Hageman et al. (1974) Arch Biochem Biophys164:707-15; Breidenbach et al. (2004) Nature 432:925-9). Indeed, whenconserved residues between Strep, Mtb, human and other putative FGEs aremapped on the surface of the Strep FGE molecule, a region of highconservation is observed where the C-terminal section of the sulfatasemotif could possibly bind (FIG. 11C).

FGE is thought to catalyze the oxidation of a thiol to an aldehyde usingtwo conserved cysteine residues within its active site (Dierks et al.(2005) Cell 121, 541-52; Roeser et al. (2006) Proc Natl Acad Sci USA103, 81-6). These cysteines in Mtb (Cys263 and Cys268) and Strep FGE(Cys272 and Cys277) are required for substrate turnover as serinemutants were unable to generate FGly in vitro (FIG. 8, panel a, FIG. 13,panels c, h, i). Interestingly, the oxidation state of these residues isdifferent between the five monomers within the asymmetric unit. Omitmaps showed Cys272 and Cys277 to be engaged in partial disulfides inthree of the five Strep FGE monomers within the asymmetric unit (FIGS.11D-11E, and data not shown). Biochemical confirmation of these partialdisulfides was provided by treating native Strep FGE's threesolvent-exposed cysteines with the thiol-labeling reagent4-chloro-7-nitrobenz-2-oxa-1,3-diazole (NBD). Two distinct populationscorresponding to Strep FGE with either 1 or 3 NBD adducts were detectedby intact-protein mass spectrometry (FIG. 14), confirming thatapproximately one-third of these two proximal cysteines are linked in adisulfide bond.

In addition to the two active site cysteines, Strep and Mtb FGE alsorequire molecular oxygen for catalysis as no FGly formation was observedin reactions performed in an anaerobic environment (FIG. 13, panel e,and data not shown). As a member of the oxygenase family, FGE might beexpected to contain a transition metal, such as Fe or Cu, or an organiccofactor, such as FADH, for activation of molecular oxygen. However,analysis by ICP-AES and x-ray absorption edge scanning indicated thatactive Mtb and Strep FGE lack all redox active metals (FIG. 12 and datanot shown). Additionally, these FGEs do not require addition of metalsto function in assays in vitro and can function in the presence of EDTA(FIG. 13, panels b, g). Similarly, UV-visible absorption spectroscopydid not reveal the presence of chromophoric organic cofactors (data notshown). Together with electron density information for Strep FGE, thesedata indicate that prokaryotic FGEs, similar to human FGE, do not useexogenous cofactors for catalysis.

As an alternate means of activating molecular oxygen, FGE may functionsimilarly to other cofactor-less oxygenases and make unique use ofconventional residues25. Absolutely conserved residues within reactivedistance from Strep FGE's catalytic cysteine pair include Trp234 andSer269. Roeser et al. have theorized that Trp234 may function toactivate molecular oxygen (Roeser et al. (2006) Proc Natl Acad Sci USA103:81-6), similar to the proposed mechanism of O₂ reduction bycatalytic antibodies₂₆. However, mutation of Trp234 to Phe did notabolish activity (FIG. 13, panel j), indicating that molecular oxygenactivation must be achieved by another route. Activity was severelyreduced by the Ser269Ala equivalent mutation in Mtb FGE (Ser260Ala)(FIG. 13, panel d), but it is currently unknown how this residue plays arole in FGE's catalytic cycle.

Interestingly, Cys272 itself may be involved in activating molecularoxygen. All five modeled Cys272 residues within the asymmetric unit ofStrep FGE have extra electron density extending away from theiralternate, non-disulfide bound conformations. Omit maps indicate thatthis extra density could be modeled as one water molecule or ahydroperoxide moiety with partial occupancy (FIG. 13, panels d, e).NBD-labeling experiments using the Cys277Ser Strep FGE mutant indicatethat Cys272 is not a reactive thiol in a subpopulation of enzymemolecules (FIG. 15), suggesting that the extra density corresponds to amoiety covalently bound to Cys272, such as hydroperoxide. Previouslypublished structures of the human FGE have also suggested that thisextra density could be hydroperoxide or cysteine sulfenic acid combinedwith a bound water molecule₂₁. However, the latter fits the observedelectron density in Strep FGE poorly. Furthermore, no sulfenic acid wasdetected when Strep FGE was treated with NBD-chloride (FIG. 15 and FIG.14). The presence of hydroperoxide modeled with partial occupancy cannotbe ruled out based on the observed electron density of Strep FGE.However, mass spectrometric analysis of intact Strep and Mtb FGErevealed no mass anomaly (FIG. 16 and data not shown), suggesting thatif Cys272 is modified, the modification is transient or acid labile.

1.-22. (canceled)
 23. A recombinant nucleic acid comprising: anexpression cassette comprising: a first nucleic acid comprising analdehyde tag-encoding sequence; and a restriction site positioned 5′ or3′ of the aldehyde tag-encoding sequence, which restriction siteprovides for insertion of a second nucleic acid encoding a polypeptideof interest; and a promoter operably linked to the expression cassetteto provide for expression of an aldehyde tagged-polypeptide produced byinsertion of the second nucleic acid encoding the polypeptide ofinterest into the restriction site.
 24. (canceled)
 25. The recombinantnucleic acid of claim 23, wherein the aldehyde tag-encoding sequenceencodes a heterologous sulfatase motif having a 2-formylglycine residue,wherein the heterologous sulfatase motif is less than 13 amino acidresidues and contains a contiguous sequence of the formula: (I)(SEQ ID NO: 1) X₁(FGly)X₂Z₂X₃R

wherein FGly is a 2-formylglycine residue; Z₂ is a proline or alanineresidue; and X₁, X₂ and X₃ are each independently any amino acid. 26.The recombinant nucleic acid of claim 25, wherein the heterologoussulfatase motif is positioned at a C-terminus of the polypeptide; theheterologous sulfatase motif is present in a terminal loop of thepolypeptide; the heterologous sulfatase motif is, when the polypeptideis a transmembrane protein, present at an internal site within anextracellular loop or an intracellular loop; the heterologous sulfatasemotif is present at an internal site or at an N-terminus of thepolypeptide, and is solvent-accessible when the polypeptide is folded;and/or the heterologous sulfatase motif is present at a site ofpost-translational modification.
 27. The recombinant nucleic acid ofclaim 25, wherein X₁, X₂, and X₃ are each independently an aliphaticamino acid, a sulfur-containing amino acid, or a polar, uncharged aminoacid.
 28. The recombinant nucleic acid of claim 25, wherein X₂ and X₃are each independently S, T, A, V, G, or C.
 29. The recombinant nucleicacid of claim 25, wherein X₁ is L, M, V, S or T and X₂ and X₃ are eachindependently S, T, A, V, G or C.
 30. The recombinant nucleic acid ofclaim 25, wherein X₁ is L, M, V, S or T.
 31. The recombinant nucleicacid of claim 30, wherein the heterologous sulfatase motif isL(FGly)TPSR (SEQ ID NO: 62).
 32. The recombinant nucleic acid of claim30, wherein the heterologous sulfatase motif is selected fromM(FGly)TPSR (SEQ ID NO: 63), V(FGly)TPSR (SEQ ID NO: 64), L(FGly)SPSR(SEQ ID NO: 65), L(FGly)APSR (SEQ ID NO: 66), L(FGly)VPSR (SEQ ID NO:67), and L(FGly)GPSR (SEQ ID NO: 68).
 33. The recombinant nucleic acidof claim 26, wherein the heterologous sulfatase motif is positioned atan internal sequence of the polypeptide.
 34. The recombinant nucleicacid of claim 26, wherein the heterologous sulfatase motif is positionedat a terminal loop, a C-terminus, or an N-terminus of the polypeptide.35. The recombinant nucleic acid of claim 26, wherein the heterologoussulfatase motif is positioned on a solvent-accessible region of thepolypeptide when folded.
 36. The recombinant nucleic acid of claim 26,wherein the heterologous sulfatase motif is positioned at a site ofpost-translational modification of the polypeptide that is native ornon-native to the amino acid sequence of the polypeptide.
 37. Therecombinant nucleic acid of claim 25, wherein the polypeptide comprisesan Fc fragment.
 38. The recombinant nucleic acid of claim 25, whereinthe polypeptide comprises an Fc polypeptide.
 39. The recombinant nucleicacid of claim 25, wherein the polypeptide is an antibody.
 40. Therecombinant nucleic acid of claim 39, wherein the antibody is an IgGantibody.
 41. The recombinant nucleic acid of claim 39, wherein theantibody is a humanized antibody.
 42. The recombinant nucleic acid ofclaim 25, wherein the polypeptide comprises an antigen-binding fragmentof an antibody.
 43. The recombinant nucleic acid of claim 42, whereinthe polypeptide comprises a Fab or Fv.
 44. The recombinant nucleic acidof claim 25, wherein the polypeptide comprises a single chain antibody.45. The recombinant nucleic acid of claim 25, wherein the polypeptide isa blood factor.
 46. The recombinant nucleic acid of claim 45, whereinthe blood factor is Factor VIII.
 47. The recombinant nucleic acid ofclaim 25, wherein the polypeptide is a fibroblast growth factor.
 48. Therecombinant nucleic acid of claim 25, wherein the polypeptide is aprotein vaccine.
 49. The recombinant nucleic acid of claim 25, whereinthe polypeptide is an enzyme.
 50. The recombinant nucleic acid of claim25, wherein the heterologous sulfatase motif is less than 12 amino acidresidues.
 51. The recombinant nucleic acid of claim 25, wherein theheterologous sulfatase motif is less than 11 amino acid residues. 52.The recombinant nucleic acid of claim 25, wherein the heterologoussulfatase motif is less than 10 amino acid residues.
 53. The recombinantnucleic acid of claim 25, wherein the heterologous sulfatase motif isless than 9 amino acid residues.
 54. The recombinant nucleic acid ofclaim 25, wherein the heterologous sulfatase motif is less than 8 aminoacid residues.
 55. The recombinant nucleic acid of claim 25, wherein theheterologous sulfatase motif is less than 7 amino acid residues.